SAP Datasphere – New Replication Flow

Mastan · ‎04-07-2023

In this article, we’ll take a look at one of the new features of SAP Datasphere which is New Replication Flow.

Background:

We already know replication capability is available in SAP Datasphere with Smart Data Integration (SDI) and SAP is not going to remove it. With the "New Replication Flow", SAP Basically bring in a new cloud-based replication tool. This cloud-based data replication tool is designed to simplify data integration processes by eliminating the need for additional on-premises components. This means that it does not rely on DP-Server/DP-Agent technology which requires installation and Maintenace but instead uses the Data Intelligence Embedded environment and Data Intelligence Connectors to connect to remote sources.

User Interface:

When it comes to the user experience, it also has been integrated and inbuilt into the existing data builder and the same comes with the monitoring features, the replication flow monitoring into the existing data integration monitor where users already find as of today the monitoring for the data flows.

When to use replication flow?

If you want to copy multiple data assets from the same source to the same target in a fast and easy way and do not require complex projections.

One thing we need to keep in mind is that replication flows will copy only certain source objects which are mentioned below.

CDS views (in ABAP-based SAP systems) that are enabled for extraction.

Tables that have a unique key (primary key)

Objects from ODP providers, such as extractors or SAP BW artifacts

The New Replication Flow also supports delta load along with the initial load. As of now, it supports minutes and hour-based scheduling, which means the delta load occurs based on the specified time frame, capturing and replicating changes from the selected source to the target. These capabilities will be further extended.

Replication delta.png

Please find the details here: Load Type

Use case and overview comparison:

Overview of Connectivity: SAP HELP – Connection Types Overview

On the source system side

SAP S4HANA Cloud or S4HANA on-premises, where we mainly talk about CDS Views extraction.

SAP ECC or Business Suite systems that we connect via SLT for mainly table-based extraction and the DMIS add-on will be installed (DMIS add-on is kind of a requirement that we need that brings in all the prerequisites into the SAP system that we need as a framework or a foundation to be able to use the replication flows).

SAP BW and BW/4HANA integration (We have different data assets that can be exposed via ODP, like ADSOs, DSOs and so on)

SAP HANA Cloud as well as HANA on-premises.

Non-SAP source which is Azure Microsoft SQL database that we can use as a source.

On the Target system side

SAP Datasphere.

Standalone HANA on-premises and Standalone HANA cloud.

HANA Data Lake Files.

Google BigQuery

Google Cloud Storage

Amazon Simple Storage (AWS S3)

Azure Data Lake Generation 2

With that lets jump into the scenario...

In this scenario, I am going to use SAP HANA Cloud system as a source.

Let's see the connection creation with the source system.

Go the dedicated Datasphere space and click on "Go to connections".

Click on Create connection and you can see the list of connection types.

Click on information icon of SAP HANA Cloud connection, there you go... it supports Replication Flows

Select the connection and provide the information about the source system you have, that's it. You are good to go...

Now, we will see how to create a Replication Flow in SAP Datasphere.

Jump into the Data Builder and click on New Replication Flow.

Note: If you don’t find the "New Replication flow", Please Check whether you have “SAP Datasphere Integrator” role assigned to your user.

To choose the source for replication click on the "Select Source Connection" which indeed shows the connections created in your Datasphere space.

Here, I am going to connect to a HANA Cloud system from where I am going to consume tables. So, select the connection and continue.

The next thing is you need to choose "Source container". Container is like a root path of target file. (For example: In case of a database, it's the database schema)

Here, I am selecting the container so that it will show the list of tables within the system.

The final thing in setting up the source system is we need to select the source objects from the path we provided in the previous step for that click on the "Add Source Objects" and choose the tables that you want and click on Next.

In the next screen, select all the objects and “Add Selection.”

Next, configure the target “Select Target Connection”.

Make sure you should choose “SAP Datasphere” as your target when you want to replicate data from SAP HANA Cloud into it.

Note: You can also have the option to select other targets as well... it depends on the connections you created with in Datasphere space. If you have Standalone HANA on-premises, Standalone HANA cloud and HANA Data Lake connections then those things will also show in the target.

Let’s change the name of the replicated tables in SAP Datasphere. If you have existing tables with similar structure, you can add those tables in place of auto generated tables.

Click on Settings to set the replication behavior, as I mentioned earlier the replication flow also supports Delta extraction. Additionally, you can one more option called "Truncate" on enabling that it will delete the data in the target structure.

For more detail regarding this section please go through Help document: Load Type

Provide the Technical/Business name of the Replication flow and save it.

Select any of the row you can see the replication properties from there you can add some projections.

Here, I want to provide some simple filters say I want to restrict JOBID and at the same time Job classification. How we can do that, here we go...

Select the JOBID section as Between and provide the low value and High value, once you are done with that click on Add expression. Now, Select Job Classification and provide some valid input and click on Add expression.

There is one more option called Mapping where you can change the exiting mappings as well as the data types which system has proposed by default and also you can add new columns to the target table. That’s it, once you are done provide the name for the projection and click on OK.

The projections which we added is listed in the replication flow...with that we completed the creation of replication flow. Let's deploy it.

We can see all the tables and Replication flow got deployed...

Now run the Replication flow, you can see a "Run" button.

With that a background job started running, you can check the detail of the background running job by clicking on the Data Integration monitor from tools section.

Once the run is complete, it will Show a message in the integration monitor. Now, let’s take a look at the tables and see if we can spot some data.

That's it, we did it. Thanks for your time to read this article on SAP Datasphere. Hopefully, this article has provided you a better understanding of one of the key features and how they can help businesses unleash the full potential of their data.

deodutt_dwivedi · ‎04-10-2023

Hi Mastan,

Thanks for the wonderful blog explaining this cloud based replication technique. Does this mean end of DP Agent and Remote table based replication which also support Real-Time Replication?

Regards,

Deo

Mastan · ‎04-10-2023

Hi Deo,

No, existing remote table replication using DP Agent will exists. This new replication tool is to move data from one source to one target (Target may not be Datasphere you can use another target as well; please check the supported targets in the blog). Additionally, we can transfer the cleansed data by applying some projections.

Thanks,
Mastan

ShailendarAnugu · ‎04-11-2023

Hi Mastan,

Good blog and nicely written.

Thanks,

Shailu.

hasba_younes · ‎04-11-2023

Hello Mastan,

thanks for the good overview of the new replication flows.

i have one question as you run the job as initial load, how will the delta load be done.

are they pushed automatically to the Targets? or do we need to schedule the delta loads?

thanks & best regards

Younes

Mastan · ‎04-11-2023

Hi shailendar.anugu2

Thank you for the feedback.

Best regards,

Mastan

MKreitlein · ‎04-11-2023

Hello Mastan,

very helpful blog... but one question: How do you mean that?

SAP Basically bring in a new cloud-based replication tool. This cloud-based data replication tool is designed to simplify data integration processes by eliminating the need for additional on-premises components

When you try to set up a connection to an ABAP system, you still need an additional on-prem component: The Cloud connector?!

Thanks, Martin

Mastan · ‎04-11-2023

Hi Younes,

In order to push the delta loads the setting should be " Initial and Delta" (Please find the screenshot below). with the initial release the delta duration is fixed to 60 minutes that means the delta load is happening every 60 minutes and capturing the replicating changes from the source to the target.

Thanks,

Mastan

Mastan · ‎04-12-2023

Hi Martin,

Thanks for the feedback.

Coming to the Question, when it comes to cloud-to-cloud use cases, if you want to replicate data from a cloud-based source (such as S4HANA Cloud) to a cloud-based target (like Datasphere), installation of On-premises components is not needed. And rather, a direct connectivity is being used and when it comes to on-premises scenarios, for example, using SAP BW, using SAP Business Suite or S4HANA on-premises we will use cloud connector. But for pure cloud-based replications, this is not needed.

Best regards,

Mastan

PhilMad · ‎04-12-2023

Hello Mastan, Thanks for this informative blog. We have used replication flows in order to fulfill combined requirements which could not be realized with remote table replication resp. data flows. The delta was determiend by a standard CDC delta CDS view and we could confirm an almost optimal delta determination and forward to Datsphere. What we could not observe is an entry in the ODQ of the underlyng S4 System, so it seem that the delta reading must happen outside of ODQ. Could you shed some light into this part. That would be very interesting. Kind regards, Philipp

Mastan · ‎04-13-2023

Hi Philipp,

Thanks for the feedback,

Can you please check out the transaction: DHCDCMON which is used to monitor the replication of CDS views via CDC engine. It provides information on the status of the replication process, including whether it is running, completed, or failed. It also provides detailed information on any errors that may have occurred during the replication process.

Best regards,

Mastan

stefan_merz · ‎04-14-2023

Hello Mastan, thanks for the informative blog.

I hve the understanding, that in replication flows CDS views as well as SAPI-Extractors can be uses. We established a connection to a S4 onPremise System. When we create a replication flow for this connection, we only see the CDS container and no container für SAPI-extractors. At least one ODP released DataSource (0CUSTOMER_TEXT) is active available in the S4-system.

Do we have to do additional customizing in order to use SAPI-extractors?

Best regard,

Stefan

MKreitlein · ‎04-14-2023

Hello Mastan,

very helpful hint about the transaction DHCDCMON ... do you know the same (App?) for S4HANA Cloud public?

I created a Replication Flow to extract from I_CUSTOMER and it "failed with error"... but in the log the runtime is still updated on every refresh. Still after 30minutes no real error message and no abort and no error log.

I will let it run over the weekend and check on Monday again... but the respective App for the same t-code would be very helpful. I could not find any in the Apps library.

Thanks, Martin

rajeshps · ‎08-31-2023

reddy07

I'm not understanding what you are trying to do here.

is it possible in SAP DI on premise?

Rambabu2 · ‎09-19-2023

Hi Mastan,

Thanks for the very informative blog.

Can you please help me with your advice on SAP ECC connection?

when we connect from SAP Data sphere to ECC (Non-Hana), Is it possible to extract/replicate data using ODP Extractors ( All BW standard data sources) to build models in Data sphere as like in BW ?

Can you please post, if you have covered it in any other blog?

Thanks and regards.

Rambabu

Mastan · ‎09-21-2023

Hi Rambabu,

Please refer blog : Replicating table data from an SAP ECC system with SAP Datasphere using Replication Flows | SAP Blog...

Hope this helps!

Thanks,

Mastan

vagarwal1 · ‎02-21-2024

Hi Team,

While creating the new connection for data flow or replication flow using below: We get below options

I opted for SAP ABAP connection and I could see that feature is enabled:

But while validating the connection getting below error.

Does this mean that both dataflow and replication flow are enabled only based on Cloud connection and not by using Data provisioning agent?

Regards,

Vikas Agarwal

Cocquerel · ‎03-15-2024

don't you miss Kafka as potential target as described in this blog ?
https://community.sap.com/t5/technology-blogs-by-sap/sap-datasphere-replication-flows-blog-series-pa...

albertosimeoni

Hello everyone,

After a Year of "denying it" I try to test these replication flows.

What I see is that they are basically useless for most use case scenario.

The problem I see is this:

For every changes in a init + delta replication flow you need to stop it before deploying changes.

The only advantage they have over a normal dataflow is to use init + delta, so this is the only use case for a replication flow.

The "stop" state means all delta information are gone, the next start will drop the content on target tables and "Restart with an INIT extraction".

What does this mean in practice?? (lets imagine 20 different sources as CDS views inside the replication flow).
1) If you need to change a source (just add a single column) => all the sources will be re-extracted from ZERO.

2) if you need to refresh a datatype inside a source => all the sources will be re-extracted from ZERO.
3) if something went wrong and replication flow stopped => all the sources will be re-extracted from ZERO.
4) if you need to add a source. Imagine that you developed SD reporting and customer asks you to develop FI reporting => all the sources will be re-extracted from ZERO.

For me, unless they became more flexible in changes, there is absolutely no purpose in using replication flows.

And the sad thing is that they are the only method to extract in delta using ODP in which you can decide the run interval. (=> so leave out Real time remote table ).

IS there a way to add tables to the sources of a INIT + Delta without stopping it??

SAP Datasphere – New Replication Flow

Get Your SAP HANA Idea Incubator Badge Today!

SCN Mission - SAP HANA Quiz Challenge is now retired

Share your #HANAStory and Win