1. Implementation Scope

stefan_geiselhart2 · ‎12-09-2017

Hi HANA EIM and SDI Community,

In this blog entry I would like to convey some of the experiences we made throughout an SDI HANA to HANA (H2H) implementation project. To gather an understanding of the context, we will start off with the scenario description and solution architecture.

These are the items that will be covered throughout this blog:

Implementation Scope

Solution Architecture

Best Practices

Challenges

Reengineering of Replication Tasks

Monitoring

Real-time Replication & Source System Archiving

References

You can expect practical insights into the implementation of a HANA to HANA repliction scenario. Some detailed aspects on e.g. task partitioning, replication task design or monitoring are described. Potentially you can adapt the approaches described in this blog in your own SDI implementation project.

1. Implementation Scope

From an SDI perspective, this brief overview will describe some facts and requirements we had to deal with:

Replicate data in real-time from 3 different HANA source systems into a (consolidated) target schema using SDI RTs (with the SDI HANAAdapter)

Replication scope approx. 550 tables per source (times 3 = > 1.600 tables)

Replicate tables with high record count (6 tables of > 2 billion in production)

SDI task partitioning for large tables (> 200 mio. records)

Target table partitioning for large tables (> 200 mio. records)

SDI infrastructure/configuration – e.g. DP-Agent + Agent groups

Follow SDI best practice guidelines (naming convention, implementation guidelines, tuning)

SDI development artifacts maintenance + transport across landscape to PRD

Dpserver + dpagent monitoring

Out of scope: Load and replication of IBM DB2 based source systems (compare with architectural diagram)

2. Solution Architecture

The end-to-end solution architecture employs several SAP and non-SAP components

DP-Agents
- Virtual host on Linux, 64 GB
- 2.1.1

HANA 2 SP02
- 4 TB

HANA EIM SDI (XSC runtime)

DLM

HANA Vora 1.4

Hadoop Cluster with Spark enabled

Microstrategy

The following illustration shows the architecture in a facilitated way. From an SDI point of view there are multiple real-time + batch input streams: Suite on HANA systems, files, legacy data from IBM DB2 DBs (not shown).

In the productive environment (as shown) each Suite on HANA (shown as HDB1/2/3) is connected employing a dedicated DP-Agent group with a HANAAdapter instance. Thus, the risk of stalling the whole replication when remote sources or RepTasks exceptions occur on source system level can be mitigated. The Hadoop and Vora part, shown on the right-hand side will not be further elaborated and are not part of this blog entry.

3. SDI Best Practices

Initially, most of the aspects (for users + authorizations) considered in the official SDI Best Practices Guide were implemented(refer to references section for the web link to the bast practices).

SDI users were organized the following way:

SDI_ADMIN - monitoring privileges, user creation, ***

SDI_DEV - Web-based development workbench, repository privileges, schema privileges

SDI_EXEC - execute replication tasks

SDI_TRANSPORT - transport SDI artifacts

Using this pattern, you can easily follow a segregation of duties approach and avoid unnecessary and unwanted situations in development or deployment. On the contrary, you have to stick with the approach and align your development and administration processes accordingly.

4. SDI Real-time Replication Design – Challenges

The following describes the major challenges we faced:

Multiple sources into one target

Replication Task Count

Duration of Initial Load and Cut-over Time

Disable the Capture of DDL Changes

1. Multiple sources into one target
- It is targeted to consolidate all source data (SAP ECCs) into one target schema. In this sense, source tables of the same fashion are replicated into one consolidated target table.
- Translated into a specific example:

The replication of e.g. table COEP is set up for all three source systems. The target for all sources is one table COEP. This target table must comply with all structural properties that exist in the sources (COEP tables across n source systems do not necessarily have the exact same structure, at least in the given case. You might have system specific appends), meaning different Z-/Y-fields across all COEP source tables. As RTs do not allow for flexible column mapping like FGs, this is how we resolved the requirement:

2. Replication Task Count
- In scope were about 550 tables which have to be replicated with different volumes and different change/delta frequency. For the largest tables, dedicated remote sources are planned to be introduced. This would translate into a remote source count of 7 for the largest SoH source.
  Since each remote source gets its own receiver/applier pair assigned (in the dpserver), this makes sense from a parallelization and performance perspective. On the downside, you have to create dedicated source system users and of course you need to maintain and organize your solution considering the set of different remote sources that are in place. In the following illustration each red arrow represents an own remote source.

3. Duration of Initial Load and Cut-over Time
- We observed quite a few tables that initially consumed much time (> 24 hours per table). By introducing task partitioning within RepTasks, combined with a proper sequence of loading we could achieve major performance improvements. Hereby, the challenge is to find appropriate columns and value ranges. Imagine a tables such as MARC (SoH Material Master Plan View) with 1.9 billion records and you should define proper ranges for your range partitioning approach. How do you do that? The solution is the profile your data with generic SQL procedures or by using other tools. Potentially you have experts at hand who might have the knowledge about the value distribution in those tables. This task can be tricky.
- Check value distribution of columns that are candidates for partitioning:

- Apply partitioning settings in RT; here: Range Partitioning on field STUNR with two parallel partitions for the initial load:

4. Disable the Capture of DDL Changes
- The HANAAdapter is able to capture DDL changes: drop/add column. Obviously if you load in real-time from three sources into one target table this influences the behavior for DDL changes considerably - i.e. you can’t capture DDL changes anymore as the (dynamic) structural differences between the source tables would cause inserts on the SDI applier side to fail.
- A good option is therefore to set the DDL scan interval to 0 which means “disabled”. The default value is 10 as you can see in the below picture:

5. Reengineering Replication Tasks

Throughout the implementation of the project, several changes in requirements, e.g. regarding RT structural details such as conversion expressions etc. occurred (you might know these change requests from your own project 😉 ).

Therefore, some PowerShell and Python scripts were implemented in order to better apply mass changes to the SDI objects. When you have around 1.600 RepTasks you will be grateful to not touch each of them one by one of course. Doing this, you need to take precautions wherever possible. Of course, you take backups of the export you might do from the web-development workbench. After you export your SDI artifacts from the web-development workbench, you can apply changes to the XML structure. You can also do this by opening the replication task from the workbench via right click -> Open in text editor.

Whatever you do in terms of editing the XML directly, you need to be aware of the find/replace operation you undertake. Of course you make sure that you only edit the XML properties you really need to! Else your replication task structure is likely to get corrupt. After applying your changes locally you can re-import the replication task in the web-development workbench.

Here are a couple of examples we had to deal with. These where resolved by applying respective scripts or using common tools such as Notepad++/Sublime.

Replace “Project Expression” in all RepTasks for a defined scope of columns (specific for each table) and input proper conversion expressions such as ABAP date NVARCHAR(8) à HANA DATE format or ABAP time NVARCHAR(6) à HANA TIME fields.

As all RepTasks were initially designed to conduct a 1:1 replication (only 1 table per RepTask and no further conversion or project expressions), a python script executed the following:

Iterate directory where RepTasks are stored, element by element

Match RepTask file with respective line in provided csv (proper identifiers required)

Find XML attribute for affected columns and replace element value “Project Expression”

Using a python script, class xml.etree.ElementTree can help facilitating to parse and edit RepTask XML structures.

Check consistency and correctness of values identifying the source system, which is defined on table level with an additional column
1. Source system 1 = ‘001’
2. Source system 2 = ‘002’
3. Source system 3 = ‘003’

Moreover, check if all RepTasks apply the correct filter criteria. This can also be done using e.g. Notepad++ to “scan” entire directories for certain patterns/values.

6. SDI-Monitoring

Monitoring of the SDI solution requires both the target HANA system and the DP-Agent(s). Apart from these “core” components, many sub-components come into play. Stating that monitoring the DP-Agent and the DP-framework on the HANA side is enough and your SDI solution is in safe hands would be wrong.

As it is a framework, which is truly embedded into the HANA database, you need to understand (optimally) all inner mechanisms that come into play when data processing happens. However, in this blog, we only want to dedicate our attention to SDI framework related monitoring options. Please do not expect to be able to monitor or troubleshoot your SDI solution after reading this. The following shall provide you only some basic input and approaches we have applied and experienced in our project.

To begin with, and most-likely well known, are the SDI monitoring views. These views unveil quite a lot of details of SDI design time objects, ongoing loads or general statuses. They also provide some control mechanisms to handle replication and load. Anyhow, when it comes to more detailed root-cause analysis, one needs to be familiar with the framework’s table, how traces are set, where log files are stored etc. A good reference here is the official SDI Administration Guide (refer to references section where links are provided).

In addition to the monitoring views, a solid set of prepared SQLs or some custom-built stored procedures can of course help to facilitate your monitoring or troubleshooting activities. E.g. during the initial load of our RepTasks you typically continue querying some important views. Especially, the REMOTE_SUBSCRIPTIONS view is helpful when it comes to monitoring a particular initial load or CDC states. The state column describes one of the following replication states. These should be well understood (you also find this information in the SDI Administration Guide):

State Name	Description
CREATED	Remote subscription is created by the replication task.
MAT_START_BEG_MARKER	The receiver is waiting for the begin marker that indicates the first changed data to queue while the initial load is running.
MAT_START_END_MARKER	The receiver queues the rows and is waiting for the end marker that indicates the last row of the initial load.
MAT_COMP_BEG_MARKER	The receiver is waiting for the begin marker that indicates the first row to queue after the initial load has completed.
MAT_COMP_END_MARKER	The receiver queues the changed rows and is waiting for the end marker that indicates the last row of the initial load. The initial load has completed and the end marker is sent to the adapter. If the state does not change to AUTO_CORRECT_CHANGE_DATA, the adapter or source system is slow in capturing the changes.
AUTO_CORRECT_CHANGE_DATA	When the end marker is received for the initial load, the applier loads the changed data captured (and queued during the initial load) to the target. If a lot of changes occurred after the initial load started, this state might take a long time to change to APPLY_CHANGE_DATA.
APPLY_CHANGE_DATA	All of the changes captured while the initial load was running have completed and are now loaded to the target.

We personally perceived the usage of the monitoring views as too cumbersome, as well as the SDA monitor in HANA Studio. This is subjective and does not apply to all cases of course. Subsequently, I present a bunch of helpful statements:

--Check for current source, if all subscriptions are in state APPLY_CHANGE_DATA – show differing subscriptions

SELECT * FROM M_REMOTE_SUBSCRIPTIONS WHERE SCHEMA_NAME = '<VIRTUAL_TABLE_SCHEMA_SOURCE>' AND STATE != '' AND STATE != 'APPLY_CHANGE_DATA';



--Check how many reptasks/remote subscriptions are in real-time replication state per remote source

SELECT SCHEMA_NAME, COUNT(*) FROM M_REMOTE_SUBSCRIPTIONS WHERE STATE = 'APPLY_CHANGE_DATA' GROUP BY SCHEMA_NAME;



--Query for exceptions on remote source/subscription level

SELECT * FROM REMOTE_SUBSCRIPTION_EXCEPTIONS;



--Check applied remote statements during the intial load (SDA request) for a given table. You need to sort out the correct statement as multiple statement might involve your search pattern

SELECT * FROM "SYS"."M_REMOTE_STATEMENTS" WHERE REMOTE_STATMENT_STRING LIKE '%MARC%';



--Find running tasks. The initial load in the SDI context is wrapped into a HANA runtime task. Therefore you can keep track of its execution by querying the appropriate tables	

SELECT TOP 100 * FROM "_SYS_TASK"."TASK_EXECUTIONS_" ORDER BY START_TIME DESC;

	

--In case of partitioned tasks, query the subsequent table

SELECT TOP 100 * FROM "_SYS_TASK"."TASK_PARTITION_EXECUTIONS_" ORDER BY START_TIME DESC;

Monitoring the DP-Agent

To keep track of what happens on the DP-Agent side, it is advisable to have a real-time monitoring tool at hand, such as Array, Baretail or when running on Linux, some native Linux tool. It is your choice. Set the trace and log levels in the DP-Agent’s ini-files. Make yourself familiar with the paths were traces and logs are stored and access them via your tool of preference.

Here is a brief overview of the dpagentconfig.ini file properties for logging:

log.level = ALL/INFO/DEBUG set the trace level

log.maxBackupIndex = 10 max number of trace files

log.maxFileSize = 10MB max size of trace file

trace.body=TRUE log entire body part of operation

Correspondingly you should see in the log path the respective trace/log files:

trc Trace file for agent

Framework_alert Alert file for framework

log Log Reader rep agent log files

log Service log files

Data Assurance

In order to assure source and target record count are the same, you can use some prepared SQL statements to query the record counts of source and target table.

Possibly, you want to provide some SQL procedure to automatically match source and target table in terms of record counts, across remote sources, e.g. taking into

7. Real-time Replication & Source System Archiving

Replication using the HANAAdapter is trigger based.

I.e. database triggers on table level exist and react upon DMLs such as: I/U/D. If a table is being replicated, and a record in the table is archived, the database trigger interprets the archiving operation as a DELETE statement, and SDI deletes the corresponding record from the target table.

The following option can be leveraged to ignore DELETE operations from an archiving run, introducing a user that executes the archiving run. This user must always be the same user.

We would have an ECC source system user e.g. “ARCHIVE_USER” that executes the archiving run. All of the operations conducted by that user will be filtered out by the HANAAdapter, no deletes will happen.

8. References

Official SAP HANA SDI Documentation:

https://help.sap.com/viewer/p/HANA_SMART_DATA_INTEGRATION

Installation and Configuration Guide

Administration Guide

Modeling Guide

SDI Best Practices:

https://www.sap.com/documents/2016/07/d06029d1-7a7c-0010-82c7-eda71af511fa.html

Abbreviations

RS = Remote Subscriptions

RT = Replication Task

CDC = Change Data Capture

I hope you can learn from the experiences we made and eventually adopt some of the approaches. I will be open for questions or anything you want to challenge.

Best regards,

Stefan

nicholas_chang · ‎12-13-2017

Hi Stefan,

Thanks for the excellent blog! Somethings cool about "SDI"! Always want to get my hands dirty on advance SDI implementation like this but hardly get any chance.

One question, can this same scenario use on HANA to non-hana target, for e.g. HANA to Hadoop?

Cheers!

Nicholas Chang

stefan_geiselhart2 · ‎12-13-2017

Hi Nicholas,

thanks for your comment.

Am I understanding correctly:

You want to distribute data from your HDB to e.g. a Hadoop target using SDI data flows (RepTask/FlowGraph)? You want to do that batch-based/scheduled or in real-time?

Best regards,

Stefan

former_member488768 · ‎12-14-2017

Hi Stefan

Thanks! Excellent blog and very informative.

We're also implementing a SDI project to extract data from ECC and it's good to know I'm not the only one using notepad++ to edit RTs.

We had experienced an issue where partitions can't be applied along with filters. Let's say there's a huge table containing data since 2000 and we're only interested to load past 3 years, and also partition is preferred. But we noticed that partition will ignore the filters in the remote statement as well as in the log file. Not sure if you've seen the same issue?

Cheers

Arthur

stefan_geiselhart2 · ‎12-14-2017

Hi Arthur,

thank you.

Can you tell which DP-Agent version, HANA revision and IM_DP version you are using? Are you using ECCAdapter or HANAAdapter? The wrong behavior you're describing that filters are not pushed down to the where clause sounds a bit familiar...

Cheers Stefan

former_member488768 · ‎12-15-2017

Hi Stefan

We're using ECC Adapter.

DP Agent 2.2.2

HANA Revision 1.00.122.12

IM_DP 2.2.2

Cheers

Arthur

nicholas_chang · ‎12-18-2017

Hi Stefan,

we would interested for real time, but batch job with 1-3 hours is acceptable too. Any advise?

cheers,

Nicholas

stefan_geiselhart2 · ‎12-20-2017

Hi Arthur,

can you x-check if the following SAP Note is applicable in your case?

2560754 - Projection filter is not pushed down to partitions when task partitioning is used

Best regards,

Stefan

sudha_rao2 · ‎02-27-2018

Hi Stefan,

Thanks for the detailed explanation.

I have few outstanding questions based on our project scenario : Would be great if you can answer those :

Can we schedule the SDI replication, which replicates tables into the Catalog physical schema directly ( like how SLT does)

Do we always have to create a target table for the SDI replication, or does it create a default table using the same name as source with the same data type structures?

Can we schedule the SLT replication as real time or non real time ( again like how it is in SLT).

Kindly confirm.

Thanks,

Sudha

stefan_geiselhart2 · ‎03-02-2018

Hi Sudha,

here are my answers:

1.) In SDI you can also schedule data flows

2.) Using replication tasks in SDI will allow you to auto-create the target table (...also defining additional columns etc. when need)

3.) Both real-time/non real-time is possible but that depends on the source you are pulling data from (compare with either SDI Config/Admin Guide or refer to SDI PAM)

Hope this helps,

Best regards,

Stefan

former_member488768 · ‎03-07-2018

Thanks Stephan, sorry for the late reply.

The note 2560754 was the right one and SAP also advised the issue will remain on HANA 1.X and won't be fixed till 2.X.

Cheers

Arthur

atul_kulkarni9 · ‎03-09-2018

Hi Stefan,

Thanks It's really useful.

Can you please give details about S 4 HANA ABAP CDS view as source with SDI.

Thanks in advance.

Best Regards,
Atul

former_member466505 · ‎03-10-2018

Hello Stefan,

Thanks for the excellent blog about SDI

I have an question. In our scenario, we have replicate from HANA to HANA

We have Replicate from ECC HANA DB (Host-XYZ) to BW HANA DB (Host-ABC) - On top of this we use BO for real time reporting.

Now, where do we deploy DP agent? on Host ABC? or On HOST XYZ? or do we need separate host for SP agent that is near to ECC HANA DB(Host-XYZ)?

Please clarify

Thanks in advance

Best Regards,

Kanagu.S

former_member561294 · ‎04-02-2018

Excellent Blog, thanks for sharing!!

Former Member · ‎04-05-2018

per best practice, it is suggested to have DP agent as a seperate server near the source. In this case should be your ECC.

"We generally recommend that you install the Data Provisioning Agent close to the data sources. If you have data sources scattered across multiple geographical locations separated by distance and network latency, you can deploy multiple Data Provisioning Agents. Install at least one in each geographical location."

former_member560453 · ‎04-06-2018

Hello Stefan,

Thanks for the excellent blog about SDI

I have an question. In our scenario, we will replicate data from SAP ERP (Oracle DB) to HANA. We plan to replicate around 500 tables, and of course, we would like to avoid to create reptask one by one.

We want to create a script, with as input, the list of table, and then generate the XML for all tables. However, how do we (how did you) deal with the tag <TargetColumns><TargetColumn>?

We try this:

<TargetObjects SchemaName="TEST">
<TargetObject Type="TABLE" ObjectName="MARA">
<TargetColumns></TargetColumns>
</TargetObject></TargetObjects>
<Mappings>
<Mapping ObjectName="MARA" RemoteObjectUniqueName="MARA" FilterExpression="" ReplicationBehavior="InitLoadOnly" VTObjectName="MARA">
<TargetColumns></TargetColumns>
</Mapping>
</Mappings>

It didn't work. error => column store error: task framework: [140004] scenario -> QUERY - Operation output requires at least one attribute

Do we have to specify, for each tables, the columns? If yes, I'm not sure to understand what your python script do.

Thanks in advance

Best Regards,

Pierre

stefan_geiselhart2 · ‎04-16-2018

Hi Atul,

to my knowledge you should be able to leverage CDS views using the ABAP adapter.

Please go ahead and check.

I haven't checked myself but as far as I can remember some colleagues of mine did this.

Best regards,

Stefan

stefan_geiselhart2 · ‎04-16-2018

Hi Kanagarasu,

as the SDI documentation is outlining, the DP agent should be deployed "as close" as possible to the data source (from a network topology point of view).

You can either install it directly on the machine where your source DB sits or you have a dedicated (can be virtual) server. For production purposes a dedicated machine is the preferred way, for pilot scenarios you might install it directly on the source.

Best regards,

Stefan

stefan_geiselhart2 · ‎04-16-2018

Hi Pierre,

I think there was a misunderstanding in the section where I describe the Python script usage.

The script we have used did not create the replication task files up from scratch.

What we did using pythons are replace operations in the XML structure or optionally small extensions when new columns had to be introduced. The columns + properties + exact syntax we prepared upfront and then ingested into the script as variables/parameters.

Potentially it is thinkable to automate the creation of replication task objects. For this purpose however some more scripting would be required, also leveraging python HANA drivers to access the DB, get metadata etc.

Best regards,
Stefan

gertjan_oostrum · ‎09-05-2018

Hi Stefan,

First of all, very usefull blog!

We are working on a similar setup, over 20 tables to be replicated in real time. We have completed the initial load.

If I check the Data Provisioning Design Time Monitor (sap/hana/im/dp/monitor/index.html) I see all the replication tasks executed succesfully.

The Remote Subscription Monitor is showing subscription state, Replicating changes in real-timeand the received count is adding properly the newly added data in the source. Also the last message received time is getting properly updated with a new timestamp (which is more recent than the initial load)

If I check the Remote Subscription Statistics I do see also a Received Count, Applied Count.

The last message received and applied however remains on the initial load timestamp.

However if I validate the data, I only see for 2 of my replication tasks data being added in the target tables. It looks like the other remote subscriptions are queuing their data and not 'offloading' into the target table.

If I check the table M_REMOTE_SUBSCRIPTIONS, all my subscriptions are having the same state "APPLY_CHANGE_DATA" which is as you point out the real time replication status.

Is there anything we need to trigger to start the writing of the data (need manually trigger or something like that) ?

david_nguyen4 · ‎11-09-2018

Hi Stefan,

Great blog, thanks for sharing. I have a quick question about the Maintenance User Filter option. Can you have multiple usernames or restricted to just one?

Thanks,

David

stefan_geiselhart2 · ‎11-15-2018

Hi David,

to my knowledge it is one user you can maintain per remote source.

Best regards,

Stefan

Former Member · ‎12-28-2018

Hi Stefan,

Great blog, thanks for sharing. I have a quick question about the SDI replication of tables in to Target system HANA. we are planning to adapt SDI.

Let me give you brief about the issue currently we are facing while using SDA(Smart Data Access).

We have side-car scenario: replication of virtual tables into HANA using SDA.

We are using HANA SPS12 version.

We read data from APO system using SDA connection in HANA.

The Volume of data from APO is 150 million on daily basis. We use SDA to load Data into HANA using Stored Procedures.

We have two Store procedures both will hit SDA for different tables in APO.

When we execute two procedures parallel consumes more time than executing sequentially.

For example, i.e. execution of two procedures at a time consumes 10 mins., where as sequential execution consumes 5 mins. (2 min + 3 min).

Can SDI will help to improve the performance of parallel access.

BR,

Saichandra

former_member588105 · ‎01-29-2019

Scheduling the SDI and SDA jobs in HANA XSA WebIDE

In HANA 1.0 XSC we had very user-friendly interface to schedule and monitor the SDI jobs. But I am not able to see the XSA flowgraphs in earlier Data provisioning design time object monitor. Do we have separate similar dashboards to schedule and monitor XSA SDI jobs now? If it is; please could you share some documents on how to activate them in HANA XSA.

Thanks,

BR,

Chandra.

hoang_ti_no_ho · ‎03-11-2019

Hi Stefan,

very usefull blog and I have some questions:

I have been trying to generate an artifact of replication task (.hdbreptask) in XML format without using RepTask-Editor and deploy the artifact.

1) Generation of artifact and deploy it using odbc-adapter should be working fine.

2) Generation of artifact and deploy it using Odata-adapter always raised the following error:

"could not activate replication task: Database error 403: internal error: Failed to create virtual table: sql syntax error: incorrect syntax near """: line 1 col 57 (at pos 57)"

Could you please advise me how to correct the error?

Is XML-syntax for both adapter the same or different?

Where can I get the docu about xml-syntax?

thank you so much.

best regards,

Ti No

stefan_geiselhart2 · ‎03-11-2019

Hi Ti No,

I am not aware of any official xml related documentation for SDI replication tasks or flowgraphs.

If you still want to continue generating it on your own I suggest you create yourself a replication task based on a virtual table that sits on a remote source of an ODataAdapter via the Web Development Workbench. Take the result and interpret the structure of the xml.

I can't tell you if there are any differences and I would have to check myself.

Best regards,
Stefan

hoang_ti_no_ho · ‎03-14-2019

Hi Stefan,

many thanks for your answer.

Your proposal is exact what I wanted to do, but unfortunately I don't have the system for this scenario (Web Development Workbench + remote source of OData Adapter). What I have is only the scenario Web Develompment Workbench + remote source of odbc adapter and remote source of SparkSQL adapter.

I am very greatly appreciated, if you can share me the structure of xml generated by Web Development Workbench with OData adapter.

best regards,

Ti No

amitcruz · ‎03-14-2019

Hello Stefan

Thanks for the wonderful article.

My question is related to HANA Replication task (hdbreptask) in HANA web based development workbench. The filter option seems not working . It means when I extract the data from ERP to virtual tables in HANA Cloud DB the filter was not working as expected. But in flowgraph (hdbflowgraph) filter seems to work. But my requirement is to extract the filtered data from backend SAP ERP and push it to HANA virtual tables .

Please help.

Amit

stefan_geiselhart2 · ‎05-06-2019

Hi Amit,

which HANA + SDI release/PLs are you using?

Could you check the executed remote statement in your traces or the SDA monitor for the initial load of your reptask?

Best regards,

Stefan

amitcruz · ‎05-10-2019

Thanks Stefan filter is working now. I was not checking the persistent tables. I have one more query related to OData adapter , just want to know if there is a way to pass the dynamic values to Query . Infact is there any way if we can pass dynamic value to our filter instead of static values in SDI Adapters.

Our requirement is we want the output of FMs in ECC to be pulled in SCP HANA D B via OData adapter on dynamic values like Alpha conversions , Unit conversions , Read long text etc.

Is there a way to read FMs via SDI through remote source , I know we can use BAPI .

amitcruz · ‎06-07-2019

Hello Stefan,

I have a requirement to extract the data from ECC system via HANA SDI ( using ABAP adapter ) into HANA Database of SAP Cloud Platform. I have created a Remote Source and Replication task ( with initial load ) and able to connect with ECC system successfully .

Now I want to capture the changes which was done in the backend ECC tables using filter. I have created a REPTASK with replication behavior as 'Initial + realtime" but unfortunately getting below error message :

[com.shell.TestSDI:DOCTP.hdbreptask] column not allowed: "MANDT" in select clause: line 1 col 8 (at pos 7)

Please refer below image :

If I remove the filter ( "ATLAS_DOCTYP" = 'I' ) then REPTASK is getting saved but during execution it is giving below error :

InternalError: dberror($.hdb.Connection.executeProcedure): 256 - SQL error, server error code: 256. sql processing error: "TESTSDI"."com.shell.TestSDI::DOCTP.START_REPLICATION": line 42 col 4 (at pos 1812): [256] (range 3) sql processing error exception: sql processing error: QUEUE: com.shell.TestSDI::DOCTP.SUB_VT_ABAPTABLES_DS1_FI_MT_DOCTP: Failed to optimize the subscription query string for remote subscription com.shell.TestSDI::DOCTP.SUB_VT_ABAPTABLES_DS1_FI_MT_DOCTP.Error: column not allowed: "MANDT" in select clause at /sapmnt/ld7272/a/HDB/jenkins_prod/workspace/HANA__FA_CO_LIN64GCC48HAPPY_rel_fa~hana1sp12/s/ptime/query/checker/check_clause.cc:4987: line 1 col 1 (at pos 0) at /sapmnt/ld7272/a/HDB/jenkins_prod/workspace/HANA__FA_CO_LIN64GCC48HAPPY_rel_fa~hana1sp12/s/ptime/query/plan_executor/trex_wrapper/trex_wrapper_body/trex_llvm.cc:914

Sometimes I am getting this error during replication task :

InternalError: dberror($.hdb.Connection.executeProcedure): 686 - SQL error, server error code: 686. start task error: "TESTSDI"."com.shell.TestSDI::DOCTP.START_REPLICATION": line 3 col 1 (at pos 148): [686] (range 3) start task error exception: start task error: [2620] executor: plan operation failed;Error executing "INSERT INTO "TESTSDI"."com.shell.TestSDI::DOCTP.ABAPTABLES_DS1_FI_MT_DOCTP" ("BLART", "SHKZG", "ATLAS_DOCTYP") SELECT "BLART", "SHKZG", "ATLAS_DOCTYP" FROM "TESTSDI"."_SYS_CE__popid_4_5CF0D47EC7E85229E10000000A4E721F_26"": Could not insert into table due to a unique constraint violation. at /sapmnt/ld7272/a/HDB/jenkins_prod/workspace/HANA__FA_CO_LIN64GCC48HAPPY_rel_fa~hana1sp12/s/ptime/query/plan_executor/trex_wrapper/trex_wrapper_body/trex_llvm.cc:914

Which replication behavior is used to get the changes done in the ECC system ( remote system ) in HANA DB via SDI ABAP Adapter ?

Your expert advice is really needed here.

Thanks in advance.

stefan_geiselhart2 · ‎06-17-2019

Hi Amit,

I would have to check myself first - don't know the details there.

However I'd like to mention the streaming option you have in regards to the ABAP Adapter. To my knowledge, this is the only way to leverage this adapter in real-time mode.

Best regards,

Stefan

amitcruz · ‎06-18-2019

Thanks Stefan for replying .

I have figured out the problem. I forgot to check the 'Truncate table' checkbox in Replication task If we don't tick that option , it will not drop the table data and try to insert the same record and thus give Unique key constraint violation.

I have another business case related to HANA SDI which require expert advice from your side:

Lets say I want to extract the material from MARA based on some filter criteria and then description of material from MAKT table for those materials.

Solution:

I have created a REPTASK for the extraction of material from MARA table. But the problem is that I cant create the standalone REPTASK of MAKT because material is not yet known to me because I cant feed the output of replication task ( Materials) as an input to another replication task.

Just want to know :

Is there any way if we can provide the output of one replication task to another replication task as input ?

Can we pass the output of HDBPROCEDURE as an input to replication task ?

We are on HANA 1.0 SP12.

Looking forward for the advice.

Thanks in advance,

Amit

former_member618571 · ‎07-09-2019

Stefan Geiselhart informative post. Appreciate sharing this. I have the same question as Nicholas Chang. Can we push data out of HANA to a Hadoop target in real time? Batch jobs are an option but its hard to achieve real-time data streaming with Batch jobs if we are dealing with 100's of queries for an given Datalake projects

former_member375266 · ‎08-20-2019

Hi Pierre Combe,

We are trying to implement the same approach in our place. Were you able to find a solution for it, if so could you please brief us on how did you do it?

former_member590188 · ‎10-01-2019

Hi Stefan,

I have a query here -

Currently i am doing real time replication from ECC to BW using SDI. Real Time replication is running successfully.

Facing challenge with currency fields (Key figures) data. Values are not matching with ECC table.

In ECC BP000 table, Monthly Net Income is 44.000,000

In BW, Monthly Net Income is appearing as 440,000.00

How to handle currency fields ?

werner_daehn · ‎10-01-2019

What is the currency for that record? When you look at the TCURF you will find a multiplication-factor or divider. Could it be that?

https://www.se80.co.uk/saptables/t/tcur/tcurf.htm

stefan_geiselhart2 · ‎10-07-2019

Hi Amit,

your question dates back long time already and I hope you could resolve it in the meantime.

However what I can say regarding Reptasks:

Allows to replicate one or multiple source tables

No join options on source tables

No complex look-ups for filter criteria or the like

In simple words, there is no ad-hoc solution to the requirement you describe using Reptasks. Using Real-time Flowgraphs however, the situation would change of course...

Best regards and sorry for the long waiting time 😉

Stefan

former_member639487 · ‎10-30-2019

Hello Stefan and Amit!

I have the same problem and yet I cannot solve it. I am using the "ABAP Adapter", but in all real-time options the same error comes up.

What is the replication behavior that should be used for this? Or which adapter is optimal for replicating ABAP tables with SDI in SAP HANA? Is there a guide? Thanks for your help.

amitcruz · ‎11-07-2019

Hello Martinez

ABAP adapter does not support real time replication. You need to use S-API extractor for your requirement that uses ODP queue and listen change in field and replicate in real time.

amitcruz · ‎11-07-2019

Thanks Stefan . We have achieved our requirement by using flowgraph and custom procedures.

former_member586689 · ‎02-05-2020

Hi Stefan,

Thanks for the informative post. I have one question though I am not sure whether this is the correct thread or not.

How can I connect to BW (which has HANA database) as a target in case of SDI. Is it possible or not?

In BODS and HCI we get a dedicated option for BW as a Target. Is something like that available in SDI or not?

stefan_geiselhart2 · ‎02-10-2020

Hi Niranjan,

what is your specific requirement? Which BW version do you run?

Suppose you'd like to load data into some HANA tables which BW accesses. This will not be a challenge from SDI side and can be achieved using HANA Adapter. However, a "BW as a Target" option is not 1:1 available in SDI.

Best regards,

Stefan

former_member586689 · ‎02-10-2020

Hi Stefan,

We have BW 7.4. Actually we have done some development in SAP Data Services(4.2 SP11) where source is SQL Server and target is "BW as a Target". We are doing feasibility check whether we can replicate same scenario in SDI as well.

We also have another scenario wherein we need to pull data from SuccessFactors and load to BW.

For both the scenarios we wanted to check the feasibility of SDI.

Regards,

Niranjan

former_member667184 · ‎03-18-2020

very usefull blog and I have questions:

I'm getting deadlock issue while executing in single flow graph with multiple source into one single target table? have you face such issue in your project.? They might be running FGs in parallel. Can you please suggest All possible options

Regards,

Abhijit

Towes71 · ‎12-16-2020

Hi Stefan

Thank you for the Blog 🙂
We have noticed that when have maintenance on DPI/DPA or source system(ECC) we often get out of sync with RT replication and are force to do full reload.
(We do Suspend Capture and Distribution on the Remotesource.)

I have tried to find a Best Practice how to ensure we avoid getting in to this situation.

I'm a basis not developer and assume there things to check/change on the Subscription?

Greatful for pointers on howto ensure we can have maintenance on SDI/DPA and satelite systems (ie ECC)

Merry X-mas

former_member442315 · ‎03-15-2021

Helle everyone,

currently I face an issue which i can't solve myself. Maybe I will find a solution by posting my challenge.

Scenario:

DB2 is connected via DB2LogReader Adapter to XSA using Smart Data Integration

Created a replication task which loads one single table

Load mode is "intial + realtime" and preserve all

Note: The scenario described above worked pretty fine for about two hours. We captured every single data change coming from the DB2 source system. At one point we truncated the table we had in use for testing directly within DB2. Since then I am no longer able to get a replication task mode "initial + realtime" running although "initial only" works as fine as before.

So the moment I execute the replication task in "initial + realtime" i receive the following error message:

Could not execute 'CALL "DELTA_XXX"."DataWareHouse.Database.Replication ...'
Error: (dberror) [686]: start task error: "DELTA_XXX"."DataWareHouse.Database.Replication Tasks::XXX_DELTA.START_REPLICATION": line 27 col 6 (at pos 1110): [140038] Error executing SQL command in task.;Error executing ALTER_REMOTE_SUBSCRIPTION_QUEUE with command: 'ALTER REMOTE SUBSCRIPTION "DELTA_XXX"."SUB_XXX" QUEUE'. ,exception 71000129: SQLException
exception 71000256: QUEUE: SUB_XXX: Failed to add subscription for remote subscription SUB_XXX[id = 16579466] in remote source XXX_LogReader[id = 14918173]. Error: exception 151050: CDC add subscription failed: RS[XXX_LogReader]: Failed to add the first subscription. Error: Failed to subscribe table ["SCHEMA"."XXX"]. Error: Initialization for replication of database <XXX> has not been done.

Maybe you guys have a working solution for me. Thanks in advance.

former_member762600 · ‎08-19-2021

Stefan, I do like your blog...so clear... 1 question... out-of-the-box, once we installed SDI with S4 on the left, EWM in the center, and ME on the right... all Master and Transaction data are handled automatically for PO, PO Release, Stock Validation in EWM, ME does its operation, ... then PO is done. Now, does it integrate to traditional IM in MM or WM? How much customization has to be done? Make our IDOC and /or BAPI calls to integrate witjh IM separately? Do we have to modify the integration that in addtion to check with EWM, we want to check with IM for side-line inventory?

stefan_geiselhart2 · ‎08-20-2021

Hi Francis,

thanks for your comment & questions.

It is difficult to understand the context you have, especially in regards to SDI setup & config. I neither know what SDI options & adapters you use in the scenario, nor do I know the data flows you would like to build.

To me it seems there's a need for some consultancy in this and the blog post here is not the right platform to address entire use cases or give in-depth troubleshooting instructions.

Sorry about that.

Best regards, Stefan

RaviGanji · ‎06-30-2022

Hi Stefan,

Nice blog, Thanks for sharing your experiences!!

Reg. SDI replication using Hana adapter (Source and Target systems are SAP Hana database)

Here is my question -

Do you (or) in your circle have implemented (any use case) to verify the behavior of CDC change in the remote system by enabling the below parameter as part of remote source configuration?

Triggers record with PK only = true
Triggers record with PK only = false

Adding questions to clarify..

On what factors, the developer would decide which approach to follow?
If we choose any of the above approach, then what kind of benefits/improvisation, constraints and impact will have on remote source and target system during replication?

I am expecting comments/experiences to share who did hands-on for the scenario.

Looking forward!

Thanks,
Ravi Ganji

Manueldp · ‎09-26-2022

Hi

The operation of those parameters:

Triggers record with PK only = true

The shadow table stores only the pk of the source table, this has less impact on the performance of the source system, but a query of the PKs of the shadow table is made with the source table

Triggers record with PK only = false

The shadow table stores all field and modification but the query is only shadow and queue tables

Obviously these options depend on the number of rows, number of fields and the number of transactions per second in the source tables.
The PK only = true option, usually has more memory consumption
The PK only = false option penalizes approximately 30% on each transaction (Insert, update, delete) on source system compared to the first option.

Regards

HANA SDI | Smart Data Integration 2.0 - H2H (HANAAdapter) Real-time Replication: Lessons Learned

Hi HANA EIM and SDI Community,

1. Implementation Scope

2. Solution Architecture

3. SDI Best Practices

4. SDI Real-time Replication Design – Challenges

5. Reengineering Replication Tasks

6. SDI-Monitoring

7. Real-time Replication & Source System Archiving

8. References

Abbreviations

Get Your SAP HANA Idea Incubator Badge Today!

SCN Mission - SAP HANA Quiz Challenge is now retired

Share your #HANAStory and Win