Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 

Hi HANA EIM and SDI Community,


In this blog entry I would like to convey some of the experiences we made throughout an SDI HANA to HANA (H2H) implementation project. To gather an understanding of the context, we will start off with the scenario description and solution architecture.

These are the items that will be covered throughout this blog:

  1. Implementation Scope

  2. Solution Architecture

  3. Best Practices

  4. Challenges

  5. Reengineering of Replication Tasks

  6. Monitoring

  7. Real-time Replication & Source System Archiving

  8. References


You can expect practical insights into the implementation of a HANA to HANA repliction scenario. Some detailed aspects on e.g. task partitioning, replication task design or monitoring are described. Potentially you can adapt the approaches described in this blog in your own SDI implementation project.

 

1.    Implementation Scope


From an SDI perspective, this brief overview will describe some facts and requirements we had to deal with:

  • Replicate data in real-time from 3 different HANA source systems into a (consolidated) target schema using SDI RTs (with the SDI HANAAdapter)

  • Replication scope approx. 550 tables per source (times 3 = > 1.600 tables)

  • Replicate tables with high record count (6 tables of > 2 billion in production)

  • SDI task partitioning for large tables (> 200 mio. records)

  • Target table partitioning for large tables (> 200 mio. records)

  • SDI infrastructure/configuration – e.g. DP-Agent + Agent groups

  • Follow SDI best practice guidelines (naming convention, implementation guidelines, tuning)

  • SDI development artifacts maintenance + transport across landscape to PRD

  • Dpserver + dpagent monitoring

  • Out of scope: Load and replication of IBM DB2 based source systems (compare with architectural diagram)


 

2.    Solution Architecture


The end-to-end solution architecture employs several SAP and non-SAP components

  • DP-Agents

    • Virtual host on Linux, 64 GB

    • 2.1.1



  • HANA 2 SP02

    • 4 TB



  • HANA EIM SDI (XSC runtime)

  • DLM

  • HANA Vora 1.4

  • Hadoop Cluster with Spark enabled

  • Microstrategy


The following illustration shows the architecture in a facilitated way. From an SDI point of view there are multiple real-time + batch input streams: Suite on HANA systems, files, legacy data from IBM DB2 DBs (not shown).

In the productive environment (as shown) each Suite on HANA (shown as HDB1/2/3) is connected employing a dedicated DP-Agent group with a HANAAdapter instance. Thus, the risk of stalling the whole replication when remote sources or RepTasks exceptions occur on source system level can be mitigated. The Hadoop and Vora part, shown on the right-hand side will not be further elaborated and are not part of this blog entry.



 

3.    SDI Best Practices


Initially, most of the aspects (for users + authorizations) considered in the official SDI Best Practices Guide were implemented(refer to references section for the web link to the bast practices).

SDI users were organized the following way:

  • SDI_ADMIN - monitoring privileges, user creation, ***

  • SDI_DEV - Web-based development workbench, repository privileges, schema privileges

  • SDI_EXEC - execute replication tasks

  • SDI_TRANSPORT - transport SDI artifacts


Using this pattern, you can easily follow a segregation of duties approach and avoid unnecessary and unwanted situations in development or deployment. On the contrary, you have to stick with the approach and align your development and administration processes accordingly.

 

4.    SDI Real-time Replication Design – Challenges


The following describes the major challenges we faced:

  1. Multiple sources into one target

  2. Replication Task Count

  3. Duration of Initial Load and Cut-over Time

  4. Disable the Capture of DDL Changes



  • 1. Multiple sources into one target

    • It is targeted to consolidate all source data (SAP ECCs) into one target schema. In this sense, source tables of the same fashion are replicated into one consolidated target table.

    • Translated into a specific example:




The replication of e.g. table COEP is set up for all three source systems. The target for all                  sources is one table COEP. This target table must comply with all structural properties that                  exist in the sources (COEP tables across n source systems do not necessarily have the exact            same structure, at least in the given case. You might have system specific appends), meaning            different Z-/Y-fields across all COEP source tables. As RTs do not allow for flexible column                  mapping like FGs, this is how we resolved the requirement:





 

  • 2. Replication Task Count

    • In scope were about 550 tables which have to be replicated with different volumes and different change/delta frequency. For the largest tables, dedicated remote sources are planned to be introduced. This would translate into a remote source count of 7 for the largest SoH source.
      Since each remote source gets its own receiver/applier pair assigned (in the dpserver), this makes sense from a parallelization and performance perspective. On the downside, you have to create dedicated source system users and of course you need to maintain and organize your solution considering the set of different remote sources that are in place. In the following illustration each red arrow represents an own remote source.




 



  • 3. Duration of Initial Load and Cut-over Time

    • We observed quite a few tables that initially consumed much time (> 24 hours per table). By introducing task partitioning within RepTasks, combined with a proper sequence of loading we could achieve major performance improvements. Hereby, the challenge is to find appropriate columns and value ranges. Imagine a tables such as MARC (SoH Material Master Plan View) with 1.9 billion records and you should define proper ranges for your range partitioning approach. How do you do that? The solution is the profile your data with generic SQL procedures or by using other tools. Potentially you have experts at hand who might have the knowledge about the value distribution in those tables. This task can be tricky.

    • Check value distribution of columns that are candidates for partitioning:






 



    • Apply partitioning settings in RT; here: Range Partitioning on field STUNR with two parallel partitions for the initial load:






 

  • 4. Disable the Capture of DDL Changes

    • The HANAAdapter is able to capture DDL changes: drop/add column. Obviously if you load in real-time from three sources into one target table this influences the behavior for DDL changes considerably - i.e. you can’t capture DDL changes anymore as the (dynamic) structural differences between the source tables would cause inserts on the SDI applier side to fail.

    • A good option is therefore to set the DDL scan interval to 0 which means “disabled”. The default value is 10 as you can see in the below picture:






 

5.    Reengineering Replication Tasks


Throughout the implementation of the project, several changes in requirements, e.g. regarding RT structural details such as conversion expressions etc. occurred (you might know these change requests from your own project 😉 ).

Therefore, some PowerShell and Python scripts were implemented in order to better apply mass changes to the SDI objects. When you have around 1.600 RepTasks you will be grateful to not touch each of them one by one of course. Doing this, you need to take precautions wherever possible. Of course, you take backups of the export you might do from the web-development workbench. After you export your SDI artifacts from the web-development workbench, you can apply changes to the XML structure. You can also do this by opening the replication task from the workbench via right click -> Open in text editor.

Whatever you do in terms of editing the XML directly, you need to be aware of the find/replace operation you undertake. Of course you make sure that you only edit the XML properties you really need to! Else your replication task structure is likely to get corrupt. After applying your changes locally you can re-import the replication task in the web-development workbench.

Here are a couple of examples we had to deal with. These where resolved by applying respective scripts or using common tools such as Notepad++/Sublime.

  1. Replace “Project Expression” in all RepTasks for a defined scope of columns (specific for each table) and input proper conversion expressions such as ABAP date NVARCHAR(8) à HANA DATE format or ABAP time NVARCHAR(6) à HANA TIME fields.




 

As all RepTasks were initially designed to conduct a 1:1 replication (only 1 table per RepTask and no further conversion or project expressions), a python script executed the following:

  • Iterate directory where RepTasks are stored, element by element

  • Match RepTask file with respective line in provided csv (proper identifiers required)

  • Find XML attribute for affected columns and replace element value “Project Expression”


Using a python script, class xml.etree.ElementTree can help facilitating to parse and edit RepTask XML structures.

  1. Check consistency and correctness of values identifying the source system, which is defined on table level with an additional column

    1. Source system 1 = ‘001’

    2. Source system 2 = ‘002’

    3. Source system 3 = ‘003’




Moreover, check if all RepTasks apply the correct filter criteria. This can also be done using e.g. Notepad++ to “scan” entire directories for certain patterns/values.

 

6.    SDI-Monitoring


Monitoring of the SDI solution requires both the target HANA system and the DP-Agent(s). Apart from these “core” components, many sub-components come into play. Stating that monitoring the DP-Agent and the DP-framework on the HANA side is enough and your SDI solution is in safe hands would be wrong.

 



 

As it is a framework, which is truly embedded into the HANA database, you need to understand (optimally) all inner mechanisms that come into play when data processing happens. However, in this blog, we only want to dedicate our attention to SDI framework related monitoring options. Please do not expect to be able to monitor or troubleshoot your SDI solution after reading this. The following shall provide you only some basic input and approaches we have applied and experienced in our project.

To begin with, and most-likely well known, are the SDI monitoring views. These views unveil quite a lot of details of SDI design time objects, ongoing loads or general statuses. They also provide some control mechanisms to handle replication and load. Anyhow, when it comes to more detailed root-cause analysis, one needs to be familiar with the framework’s table, how traces are set, where log files are stored etc. A good reference here is the official SDI Administration Guide (refer to references section where links are provided).

In addition to the monitoring views, a solid set of prepared SQLs or some custom-built stored procedures can of course help to facilitate your monitoring or troubleshooting activities. E.g. during the initial load of our RepTasks you typically continue querying some important views. Especially, the REMOTE_SUBSCRIPTIONS view is helpful when it comes to monitoring a particular initial load or CDC states. The state column describes one of the following replication states. These should be well understood (you also find this information in the SDI Administration Guide):




































State Name Description
CREATED Remote subscription is created by the replication task.
MAT_START_BEG_MARKER The receiver is waiting for the begin marker that indicates the first changed data to queue while the initial load is running.
MAT_START_END_MARKER The receiver queues the rows and is waiting for the end marker that indicates the last row of the initial load.
MAT_COMP_BEG_MARKER The receiver is waiting for the begin marker that indicates the first row to queue after the initial load has completed.
MAT_COMP_END_MARKER The receiver queues the changed rows and is waiting for the end marker that indicates the last row of the initial load. The initial load has completed and the end marker is sent to the adapter. If the state does not change to AUTO_CORRECT_CHANGE_DATA, the adapter or source system is slow in capturing the changes.
AUTO_CORRECT_CHANGE_DATA

When the end marker is received for the initial load, the applier loads the changed data captured (and queued during the initial load) to the target.

If a lot of changes occurred after the initial load started, this state might take a long time to change to APPLY_CHANGE_DATA.
APPLY_CHANGE_DATA All of the changes captured while the initial load was running have completed and are now loaded to the target.


 

We personally perceived the usage of the monitoring views as too cumbersome, as well as the SDA monitor in HANA Studio. This is subjective and does not apply to all cases of course. Subsequently, I present a bunch of helpful statements:
--Check for current source, if all subscriptions are in state APPLY_CHANGE_DATA – show differing subscriptions
SELECT * FROM M_REMOTE_SUBSCRIPTIONS WHERE SCHEMA_NAME = '<VIRTUAL_TABLE_SCHEMA_SOURCE>' AND STATE != '' AND STATE != 'APPLY_CHANGE_DATA';

--Check how many reptasks/remote subscriptions are in real-time replication state per remote source
SELECT SCHEMA_NAME, COUNT(*) FROM M_REMOTE_SUBSCRIPTIONS WHERE STATE = 'APPLY_CHANGE_DATA' GROUP BY SCHEMA_NAME;

--Query for exceptions on remote source/subscription level
SELECT * FROM REMOTE_SUBSCRIPTION_EXCEPTIONS;

--Check applied remote statements during the intial load (SDA request) for a given table. You need to sort out the correct statement as multiple statement might involve your search pattern
SELECT * FROM "SYS"."M_REMOTE_STATEMENTS" WHERE REMOTE_STATMENT_STRING LIKE '%MARC%';

--Find running tasks. The initial load in the SDI context is wrapped into a HANA runtime task. Therefore you can keep track of its execution by querying the appropriate tables
SELECT TOP 100 * FROM "_SYS_TASK"."TASK_EXECUTIONS_" ORDER BY START_TIME DESC;

--In case of partitioned tasks, query the subsequent table
SELECT TOP 100 * FROM "_SYS_TASK"."TASK_PARTITION_EXECUTIONS_" ORDER BY START_TIME DESC;

 

 Monitoring the DP-Agent

To keep track of what happens on the DP-Agent side, it is advisable to have a real-time monitoring tool at hand, such as Array, Baretail or when running on Linux, some native Linux tool. It is your choice. Set the trace and log levels in the DP-Agent’s ini-files. Make yourself familiar with the paths were traces and logs are stored and access them via your tool of preference.

Here is a brief overview of the dpagentconfig.ini file properties for logging:

  • log.level = ALL/INFO/DEBUG set the trace level

  • log.maxBackupIndex = 10 max number of trace files

  • log.maxFileSize = 10MB                                     max size of trace file

  • trace.body=TRUE log entire body part of operation


Correspondingly you should see in the log path the respective trace/log files:

  • trc Trace file for agent

  • Framework_alert Alert file for framework

  • log Log Reader rep agent log files

  • log Service log files


 

Data Assurance

In order to assure source and target record count are the same, you can use some prepared SQL statements to query the record counts of source and target table.

Possibly, you want to provide some SQL procedure to automatically match source and target table in terms of record counts, across remote sources, e.g. taking into

 

7.    Real-time Replication & Source System Archiving


Replication using the HANAAdapter is trigger based.

I.e. database triggers on table level exist and react upon DMLs such as: I/U/D. If a table is being replicated, and a record in the table is archived, the database trigger interprets the archiving operation as a DELETE statement, and SDI deletes the corresponding record from the target table.

The following option can be leveraged to ignore DELETE operations from an archiving run, introducing a user that executes the archiving run. This user must always be the same user.

 



 

We would have an ECC source system user e.g. “ARCHIVE_USER” that executes the archiving run. All of the operations conducted by that user will be filtered out by the HANAAdapter, no deletes will happen.

 

8.    References


Official SAP HANA SDI Documentation:

https://help.sap.com/viewer/p/HANA_SMART_DATA_INTEGRATION

  • Installation and Configuration Guide

  • Administration Guide

  • Modeling Guide


SDI Best Practices:

https://www.sap.com/documents/2016/07/d06029d1-7a7c-0010-82c7-eda71af511fa.html

 

Abbreviations



  • RS = Remote Subscriptions

  • RT = Replication Task

  • CDC = Change Data Capture


 

I hope you can learn from the experiences we made and eventually adopt some of the approaches. I will be open for questions or anything you want to challenge.

Best regards,

Stefan
52 Comments
nicholas_chang
Active Contributor
Hi Stefan,

Thanks for the excellent blog! Somethings cool about "SDI"! Always want to get my hands dirty on advance SDI implementation like this but hardly get any chance.

One question, can this same scenario use on HANA to non-hana target, for e.g. HANA to Hadoop?

Cheers!

Nicholas Chang

 
0 Kudos
Hi Nicholas,

thanks for your comment.

Am I understanding correctly:

You want to distribute data from your HDB to e.g. a Hadoop target using SDI data flows (RepTask/FlowGraph)? You want to do that batch-based/scheduled or in real-time?

Best regards,

Stefan
former_member488768
Discoverer
0 Kudos
Hi Stefan

Thanks! Excellent blog and very informative.

We're also implementing a SDI project to extract data from ECC and it's good to know I'm not the only one using notepad++ to edit RTs.

We had experienced an issue where partitions can't be applied along with filters. Let's say there's a huge table containing data since 2000 and we're only interested to load past 3 years, and also partition is preferred. But we noticed that partition will ignore the filters in the remote statement as well as in the log file. Not sure if you've seen the same issue?

Cheers

Arthur
0 Kudos
Hi Arthur,

thank you.

Can you tell which DP-Agent version, HANA revision and IM_DP version you are using? Are you using ECCAdapter or HANAAdapter? The wrong behavior you're describing that filters are not pushed down to the where clause sounds a bit familiar...

Cheers Stefan
former_member488768
Discoverer
0 Kudos
Hi Stefan

We're using ECC Adapter.

DP Agent 2.2.2

HANA Revision 1.00.122.12

IM_DP 2.2.2

 

Cheers

Arthur
nicholas_chang
Active Contributor
0 Kudos
Hi Stefan,

we would interested for real time, but batch job with 1-3 hours is acceptable too. Any advise?

 

cheers,

Nicholas

 
0 Kudos
Hi Arthur,

can you x-check if the following SAP Note is applicable in your case?

2560754 - Projection filter is not pushed down to partitions when task partitioning is used

 

Best regards,

Stefan
sudha_rao2
Explorer
0 Kudos
Hi Stefan,

Thanks for the detailed explanation.

I have few outstanding questions based on our project scenario : Would be great if you can answer those :

  1. Can we schedule the SDI replication, which replicates tables into the Catalog physical schema directly ( like how SLT does)

  2. Do we always have to create a target table for the SDI replication, or does it create a default table using the same name as source with the same data type structures?

  3. Can we schedule the SLT replication as real time or non real time ( again like how it is in SLT).


Kindly confirm.

Thanks,

Sudha
0 Kudos
Hi Sudha,

here are my answers:

1.) In SDI you can also schedule data flows

2.) Using replication tasks in SDI will allow you to auto-create the target table (...also defining additional columns etc. when need)

3.) Both real-time/non real-time is possible but that depends on the source you are pulling data from (compare with either SDI Config/Admin Guide or refer to SDI PAM)

Hope this helps,

Best regards,

Stefan
former_member488768
Discoverer
Thanks Stephan, sorry for the late reply.

The note 2560754 was the right one and SAP also advised the issue will remain on HANA 1.X and won't be fixed till 2.X.

Cheers

Arthur
atul_kulkarni9
Explorer
0 Kudos
Hi Stefan,

 

Thanks It's really useful.

 

Can you please give details about S 4 HANA ABAP CDS view as source with SDI.

Thanks in advance.

Best Regards,
Atul
0 Kudos
Hello Stefan,

Thanks for the excellent blog about SDI

I have an question. In our scenario, we have replicate from HANA to HANA

We have Replicate from ECC HANA DB (Host-XYZ) to BW HANA DB (Host-ABC) - On top of this we use BO for real time reporting.

Now, where do we deploy DP agent? on Host ABC? or On HOST XYZ? or do we need separate host for SP agent that is near to ECC HANA DB(Host-XYZ)?

Please clarify

Thanks in advance

Best Regards,

Kanagu.S
former_member561294
Discoverer
0 Kudos
Excellent Blog, thanks for sharing!!
Former Member
0 Kudos
per best practice, it is suggested to have DP agent as a seperate server near the source. In this case should be your ECC.

 

"We generally recommend that you install the Data Provisioning Agent close to the data sources. If you have data sources scattered across multiple geographical locations separated by distance and network latency, you can deploy multiple Data Provisioning Agents. Install at least one in each geographical location."
former_member560453
Discoverer
0 Kudos
Hello Stefan,

Thanks for the excellent blog about SDI

I have an question. In our scenario, we will replicate data from SAP ERP (Oracle DB) to HANA. We  plan to replicate around 500 tables, and of course, we would like to avoid to create reptask one by one.

We want to create a script, with as input, the list of table, and then generate the XML for all tables. However, how do we (how did you) deal with the tag <TargetColumns><TargetColumn>?

We try this:

<TargetObjects SchemaName="TEST">
<TargetObject Type="TABLE" ObjectName="MARA">
<TargetColumns></TargetColumns>
</TargetObject></TargetObjects>
<Mappings>
<Mapping ObjectName="MARA" RemoteObjectUniqueName="MARA" FilterExpression="" ReplicationBehavior="InitLoadOnly" VTObjectName="MARA">
<TargetColumns></TargetColumns>
</Mapping>
</Mappings>

It didn't work. error =>  column store error: task framework: [140004] scenario -> QUERY - Operation output requires at least one attribute

Do we have to specify, for each tables, the columns? If yes, I'm not sure to understand what your python script do.

Thanks in advance

Best Regards,

Pierre
0 Kudos
Hi Atul,

to my knowledge you should be able to leverage CDS views using the ABAP adapter.

Please go ahead and check.

I haven't checked myself but as far as I can remember some colleagues of mine did this.

Best regards,

Stefan
0 Kudos
Hi Kanagarasu,

as the SDI documentation is outlining, the DP agent should be deployed "as close" as possible to  the data source (from a network topology point of view).

You can either install it directly on the machine where your source DB sits or you have a dedicated (can be virtual) server. For production purposes a dedicated machine is the preferred way, for pilot scenarios you might install it directly on the source.

Best regards,

Stefan
0 Kudos
Hi Pierre,

I think there was a misunderstanding in the section where I describe the Python script usage.

The script we have used did not create the replication task files up from scratch.

What we did using pythons are replace operations in the XML structure or optionally small extensions when new columns had to be introduced. The columns + properties + exact syntax we prepared upfront and then ingested into the script as variables/parameters.

Potentially it is thinkable to automate the creation of replication task objects. For this purpose however some more scripting would be required, also leveraging python HANA drivers to access the DB, get metadata etc.

Best regards,
Stefan
gertjan_oostrum
Explorer
0 Kudos
Hi Stefan,

First of all, very usefull blog!

We are working on a similar setup, over 20 tables to be replicated in real time. We have completed the initial load.

If I check the Data Provisioning Design Time Monitor (sap/hana/im/dp/monitor/index.html) I see all the replication tasks executed succesfully.

The Remote Subscription Monitor is showing subscription state, Replicating changes in real-timeand the received count is adding properly the newly added data in the source. Also the last message received time is getting properly updated with a new timestamp (which is more recent than the initial load)

If I check the Remote Subscription Statistics I do see also a Received Count, Applied Count.

The last message received and applied however remains on the initial load timestamp.

However if I validate the data, I only see for 2 of my replication tasks data being added in the target tables. It looks like the other remote subscriptions are queuing their data and not 'offloading' into the target table.

If I check the table M_REMOTE_SUBSCRIPTIONS, all my subscriptions are having the same state "APPLY_CHANGE_DATA" which is as you point out the real time replication status.

Is there anything we need to trigger to start the writing of the data (need manually trigger or something like that) ?
david_nguyen4
Explorer
0 Kudos
Hi Stefan,

Great blog, thanks for sharing.  I have a quick question about the Maintenance User Filter option.  Can you have multiple usernames or restricted to just one?

Thanks,

David
0 Kudos
Hi David,

to my knowledge it is one user you can maintain per remote source.

Best regards,

Stefan
Former Member
0 Kudos
Hi Stefan,

Great blog, thanks for sharing.  I have a quick question about the SDI replication of tables in to Target system HANA. we are planning to adapt SDI.

Let me give you brief about the issue currently we are facing while using SDA(Smart Data Access).

We have side-car scenario: replication of virtual tables into HANA using SDA.

We are using HANA SPS12 version.

We read data from APO system using SDA connection in HANA.

The Volume of data from APO is 150 million on daily basis. We use SDA to load Data into HANA using Stored Procedures.

We have two Store procedures both will hit SDA for different tables in APO.

When we execute two procedures parallel consumes more time than executing sequentially.

For example, i.e. execution of two procedures at a time consumes 10 mins., where as sequential execution consumes 5 mins. (2 min + 3 min).

Can SDI will help to improve the performance of parallel access.

BR,

Saichandra

 
0 Kudos
Scheduling the SDI and SDA jobs in HANA XSA WebIDE

In HANA 1.0 XSC we had very user-friendly interface to schedule and monitor the SDI jobs. But I am not able to see the XSA flowgraphs in earlier Data provisioning design time object monitor. Do we have separate similar dashboards to schedule and monitor XSA SDI jobs now? If it is; please could you share some documents on how to activate them in HANA XSA.

Thanks,

BR,

Chandra.
hoang_ti_no_ho
Explorer
0 Kudos
Hi Stefan,

 

very usefull blog and I have some questions:

I have been trying to generate an artifact of replication task (.hdbreptask) in XML format without using RepTask-Editor and deploy the artifact.

1) Generation of artifact and deploy it using odbc-adapter should be working fine.

2) Generation of artifact and deploy it using Odata-adapter always raised the following error:

"could not activate replication task: Database error 403: internal error: Failed to create virtual table: sql syntax error: incorrect syntax near """: line 1 col 57 (at pos 57)"

Could you please advise me how to correct the error?

Is XML-syntax for both adapter the same or different?

Where can I get the docu about xml-syntax?

thank you so much.

best regards,

Ti No
0 Kudos
Hi Ti No,

I am not aware of any official xml related documentation for SDI replication tasks or flowgraphs.

If you still want to continue generating it on your own I suggest you create yourself a replication task based on a virtual table that sits on a remote source of an ODataAdapter via the Web Development Workbench. Take the result and interpret the structure of the xml.

I can't tell you if there are any differences and I would have to check myself.

Best regards,
Stefan
hoang_ti_no_ho
Explorer
0 Kudos
Hi Stefan,

 

many thanks for your answer.

Your proposal is exact what I wanted to do, but unfortunately I don't have the system for this scenario (Web Development Workbench + remote source of OData Adapter). What I have is only the scenario Web Develompment Workbench + remote source of odbc adapter and remote source of SparkSQL adapter.

I am very greatly appreciated, if you can share me the structure of xml generated by Web Development Workbench with OData adapter.

best regards,

Ti No
amitcruz
Participant
0 Kudos
Hello Stefan

Thanks for the wonderful article.

My question is related to HANA Replication task (hdbreptask) in HANA web based development workbench. The filter option seems not working . It means when I extract the data from ERP to virtual tables in HANA Cloud DB the filter was not working as expected. But in flowgraph (hdbflowgraph)  filter seems to work. But my requirement is to extract the filtered data from backend SAP ERP and push it to HANA virtual tables .

 

Please help.

 

Amit
0 Kudos
Hi Amit,

which HANA + SDI release/PLs are you using?

Could you check the executed remote statement in your traces or the SDA monitor for the initial load of your reptask?

Best regards,

Stefan
amitcruz
Participant
0 Kudos
Thanks Stefan filter is working now. I was not checking the persistent tables. I have one more query related to OData adapter , just want to know if there is a way to pass the dynamic values to Query . Infact is there any way if we can pass dynamic value to our filter instead of static values in SDI Adapters.

Our requirement is we want the output of FMs in ECC to be pulled in SCP HANA D B via OData adapter on dynamic values like Alpha conversions , Unit conversions , Read long text etc.

Is there a way to read FMs via SDI through remote source , I know we can use BAPI .
amitcruz
Participant
0 Kudos
Hello Stefan,

I have a requirement to extract the data from ECC system via HANA SDI ( using ABAP adapter ) into HANA Database of SAP Cloud Platform. I have created a Remote Source and Replication task ( with initial load ) and able to connect with ECC system successfully .

Now I want to capture the changes which was done in the backend ECC tables using filter. I have created a REPTASK with replication behavior as 'Initial + realtime" but unfortunately getting below error message :

[com.shell.TestSDI:DOCTP.hdbreptask] column not allowed: "MANDT" in select clause: line 1 col 8 (at pos 7)

Please refer below image :



If I remove the filter ( "ATLAS_DOCTYP" = 'I' )  then REPTASK is getting saved but during execution it is giving below error :

InternalError: dberror($.hdb.Connection.executeProcedure): 256 - SQL error, server error code: 256. sql processing error: "TESTSDI"."com.shell.TestSDI::DOCTP.START_REPLICATION": line 42 col 4 (at pos 1812): [256] (range 3) sql processing error exception: sql processing error: QUEUE: com.shell.TestSDI::DOCTP.SUB_VT_ABAPTABLES_DS1_FI_MT_DOCTP: Failed to optimize the subscription query string for remote subscription com.shell.TestSDI::DOCTP.SUB_VT_ABAPTABLES_DS1_FI_MT_DOCTP.Error: column not allowed: "MANDT" in select clause at /sapmnt/ld7272/a/HDB/jenkins_prod/workspace/HANA__FA_CO_LIN64GCC48HAPPY_rel_fa~hana1sp12/s/ptime/query/checker/check_clause.cc:4987: line 1 col 1 (at pos 0) at /sapmnt/ld7272/a/HDB/jenkins_prod/workspace/HANA__FA_CO_LIN64GCC48HAPPY_rel_fa~hana1sp12/s/ptime/query/plan_executor/trex_wrapper/trex_wrapper_body/trex_llvm.cc:914

Sometimes I am getting this error during replication task :

InternalError: dberror($.hdb.Connection.executeProcedure): 686 - SQL error, server error code: 686. start task error: "TESTSDI"."com.shell.TestSDI::DOCTP.START_REPLICATION": line 3 col 1 (at pos 148): [686] (range 3) start task error exception: start task error: [2620] executor: plan operation failed;Error executing "INSERT INTO "TESTSDI"."com.shell.TestSDI::DOCTP.ABAPTABLES_DS1_FI_MT_DOCTP" ("BLART", "SHKZG", "ATLAS_DOCTYP") SELECT "BLART", "SHKZG", "ATLAS_DOCTYP" FROM "TESTSDI"."_SYS_CE__popid_4_5CF0D47EC7E85229E10000000A4E721F_26"": Could not insert into table due to a unique constraint violation. at /sapmnt/ld7272/a/HDB/jenkins_prod/workspace/HANA__FA_CO_LIN64GCC48HAPPY_rel_fa~hana1sp12/s/ptime/query/plan_executor/trex_wrapper/trex_wrapper_body/trex_llvm.cc:914

 

Which replication behavior is used to get the changes done in the ECC system ( remote system ) in HANA DB via SDI ABAP Adapter ?

Your expert advice is really needed here.

Thanks in advance.
0 Kudos
Hi Amit,

I would have to check myself first - don't know the details there.

However I'd like to mention the streaming option you have in regards to the ABAP Adapter. To my knowledge, this is the only way to leverage this adapter in real-time mode.

Best regards,

Stefan
amitcruz
Participant
0 Kudos
Thanks Stefan for replying .

I have figured out the problem. I forgot to check the  'Truncate table' checkbox in Replication task If we don't tick that option , it will not drop the table data and try to insert the same record and thus give Unique key constraint violation.

I have another business case related to HANA SDI which require expert advice from your side:

Lets say I want to extract the material from MARA based on some filter criteria and then description of material from MAKT table for those materials.

Solution:

I have created a REPTASK for the extraction of material from MARA table.  But the problem is that I cant create the standalone REPTASK of MAKT because material is not yet known to me because I cant feed the output of replication task ( Materials) as an input to another replication task.

Just want to know :

Is there any way if we can provide the output of one replication task to another replication task as input ?

Can we pass the output of HDBPROCEDURE  as an input to replication task ?

We are on HANA 1.0 SP12.

Looking forward for the advice.

Thanks in advance,

Amit

 
former_member618571
Discoverer
0 Kudos
Stefan Geiselhart informative post. Appreciate sharing this. I have the same question as Nicholas Chang. Can we push data out of HANA to a Hadoop target in real time? Batch jobs are an option but its hard to achieve real-time data streaming with Batch jobs if we are dealing with 100's of queries for an given Datalake projects
0 Kudos
Hi Pierre Combe,

 

We are trying to implement the same approach in our place.  Were you able to find a solution for it, if so could you please brief us on how did you do it?

 

 
0 Kudos
Hi Stefan,

I have a query here  -

Currently i am doing real time replication from ECC to BW using SDI.  Real Time replication is running successfully.

Facing challenge with currency fields (Key figures) data. Values are not matching with ECC table.

In ECC BP000 table, Monthly Net Income is 44.000,000

In BW, Monthly Net Income is appearing as 440,000.00

 

How to handle currency fields ?
werner_daehn
Active Contributor
0 Kudos

What is the currency for that record? When you look at the TCURF you will find a multiplication-factor or divider. Could it be that?

 

https://www.se80.co.uk/saptables/t/tcur/tcurf.htm

0 Kudos
Hi Amit,

your question dates back long time already and I hope you could resolve it in the meantime.

However what I can say regarding Reptasks:

  • Allows to replicate one or multiple source tables

  • No join options on source tables

  • No complex look-ups for filter criteria or the like


In simple words, there is no ad-hoc solution to the requirement you describe using Reptasks. Using Real-time Flowgraphs however, the situation would change of course...

Best regards and sorry for the long waiting time 😉

Stefan
former_member639487
Discoverer
0 Kudos

Hello Stefan and Amit!

I have the same problem and yet I cannot solve it. I am using the "ABAP Adapter", but in all real-time options the same error comes up.

What is the replication behavior that should be used for this? Or which adapter is optimal for replicating ABAP tables with SDI in SAP HANA? Is there a guide? Thanks for your help.

amitcruz
Participant
0 Kudos
Hello Martinez

ABAP adapter does not support real time replication. You need to use S-API extractor for your requirement that uses ODP queue and listen change in field and replicate in real time.
amitcruz
Participant
0 Kudos
Thanks Stefan . We have achieved our requirement by using flowgraph and custom procedures.

 
0 Kudos
Hi Stefan,

Thanks for the informative post. I have one question though I am not sure whether this is the correct thread or not.

How can I connect to BW (which has HANA database) as a target in case of SDI. Is it possible or not?

In BODS and HCI we get a dedicated option for BW as a Target. Is something like that available in SDI or not?
Hi Niranjan,

what is your specific requirement? Which BW version do you run?

Suppose you'd like to load data into some HANA tables which BW accesses. This will not be a challenge from SDI side and can be achieved using HANA Adapter. However, a "BW as a Target" option is not 1:1 available in SDI.

Best regards,

Stefan
0 Kudos
Hi Stefan,

We have BW 7.4. Actually we have done some development in SAP Data Services(4.2 SP11) where source is SQL Server and target is "BW as a Target". We are doing feasibility check whether we can replicate same scenario in SDI as well.

We also have another scenario wherein we need to pull data from SuccessFactors and load to BW.

For both the scenarios we wanted to check the feasibility of SDI.

 

Regards,

Niranjan
0 Kudos
very usefull blog and I have questions:

I'm getting deadlock issue while executing in single flow graph with multiple source into one single target table? have you face such issue in your project.? They might be running FGs in parallel. Can you please  suggest All possible options

 

Regards,

Abhijit
Towes71
Explorer
0 Kudos
Hi Stefan

Thank you for the Blog 🙂
We have noticed that when have maintenance on DPI/DPA or source system(ECC) we often get out of sync with RT replication and are force to do full reload.
(We do Suspend Capture and Distribution on the Remotesource.)

I have tried to find a Best Practice how to ensure we avoid getting in to this situation.

I'm a basis not developer and assume there things to check/change on the Subscription?

Greatful for pointers on howto ensure we can have maintenance on SDI/DPA and satelite systems (ie ECC)

Merry X-mas
0 Kudos
Helle everyone,

currently I face an issue which i can't solve myself. Maybe I will find a solution by posting my challenge.

Scenario:

  • DB2 is connected via DB2LogReader Adapter to XSA using Smart Data Integration

  • Created a replication task which loads one single table

  • Load mode is "intial + realtime" and preserve all


Note: The scenario described above worked pretty fine for about two hours. We captured every single data change coming from the DB2 source system. At one point we truncated the table we had in use for testing directly within DB2. Since then I am no longer able to get a replication task mode "initial + realtime" running although "initial only" works as fine as before.

So the moment I execute the replication task in "initial + realtime" i receive the following error message:

Could not execute 'CALL "DELTA_XXX"."DataWareHouse.Database.Replication ...'
Error: (dberror) [686]: start task error: "DELTA_XXX"."DataWareHouse.Database.Replication Tasks::XXX_DELTA.START_REPLICATION": line 27 col 6 (at pos 1110): [140038] Error executing SQL command in task.;Error executing ALTER_REMOTE_SUBSCRIPTION_QUEUE with command: 'ALTER REMOTE SUBSCRIPTION "DELTA_XXX"."SUB_XXX" QUEUE'. ,exception 71000129: SQLException
exception 71000256: QUEUE: SUB_XXX: Failed to add subscription for remote subscription SUB_XXX[id = 16579466] in remote source XXX_LogReader[id = 14918173]. Error: exception 151050: CDC add subscription failed: RS[XXX_LogReader]: Failed to add the first subscription. Error: Failed to subscribe table ["SCHEMA"."XXX"]. Error: Initialization for replication of database <XXX> has not been done.

Maybe you guys have a working solution for me. Thanks in advance.
former_member762600
Discoverer
0 Kudos
Stefan, I do like your blog...so clear... 1 question... out-of-the-box, once we installed SDI with S4 on the left, EWM in the center, and ME on the right... all Master and Transaction data are handled automatically for PO, PO Release, Stock Validation in EWM, ME does its operation, ... then PO is done. Now, does it integrate to traditional IM in MM or WM?  How much customization has to be done? Make our IDOC and /or BAPI calls  to integrate witjh IM separately? Do we have to modify the integration that in addtion to check with EWM, we want to check with IM for side-line inventory?
0 Kudos
Hi Francis,

thanks for your comment & questions.

It is difficult to understand the context you have, especially in regards to SDI setup & config. I neither know what SDI options & adapters you use in the scenario, nor do I know the data flows you would like to build.

To me it seems there's a need for some consultancy in this and the blog post here is not the right platform to address entire use cases or give in-depth troubleshooting instructions.

Sorry about that.

Best regards, Stefan
RaviGanji
Discoverer
0 Kudos

Hi Stefan,

Nice blog, Thanks for sharing your experiences!!

Reg. SDI replication using Hana adapter (Source and Target systems are SAP Hana database)

Here is my question -

Do you (or) in your circle have implemented (any use case) to verify the behavior of CDC change in the remote system by enabling the below parameter as part of remote source configuration?

  • Triggers record with PK only = true
  • Triggers record with PK only = false

Adding questions to clarify..

  1. On what factors, the developer would decide which approach to follow?
  2. If we choose any of the above approach, then what kind of benefits/improvisation, constraints and impact will have on remote source and target system during replication?

I am expecting comments/experiences to share who did hands-on for the scenario.

Looking forward!

Thanks,
Ravi Ganji

Manueldp
Associate
Associate
0 Kudos
Hi

The operation of those parameters:

  • Triggers record with PK only = true


The shadow table stores only the pk of the source table, this has less impact on the performance of the source system, but a query of the PKs of the shadow table is made with the source table

  • Triggers record with PK only = false


The shadow table stores all field and modification but the query is only shadow and queue tables

Obviously these options depend on the number of rows, number of fields and the number of transactions per second in the source tables.
The PK only = true option, usually has more memory consumption
The PK only = false option penalizes approximately 30% on each transaction (Insert, update, delete) on source system compared to the first option.

 

Regards