Introduction premium outbound integration

tobias_koebler · ‎11-16-2023

This blog is part of a blog series from SAP Datasphere product management with the focus on the Replication Flow capabilities in SAP Datasphere:

In the first detailed blog, you learned how to setup a replication within Replication Flow and got some more details about the different settings and monitoring.

Introduction premium outbound integration

With the mid-November release of SAP Datasphere, some new target connectivity types are made available. With the clear focus to strengthen integrated customer scenarios we named any external connectivity “premium outbound integration” to outline the data integration to external parties. It allows the data movement into objects stores from the main providers as well as BigQuery as part of the SAP-Google Analytics Partnership. The following connectivity options are now available:

Google BigQuery as part of the endorsed partnership

Google GCS

Azure Data Lake Storage Gen 2

Amazon Simple Storage Service (AWS S3)

Connectivity Overview November 2023

With this new enhancement customer can start spreading their use cases and add integration to object stores into their scenarios. SAP Datasphere as business data fabric enables the communication between the SAP centric business data “world” and data assets stored externally. Premium Outbound Integration is the recommended way to move data out of any SAP system to external targets.

Integration with SAP BigQuery

To highlight one of the most wanted replication scenarios, we want to show you a step-by-step explanation of a Replication Flow from SAP S/4HANA to BigQuery.

For this blog we assume the Google BigQuery connection has already been created in the space by following the step by step guide in the SAP Datasphere documentation: BigQuery Docu @ help.sap.com

Specify Replication Flow

In our scenario we defined as a source connection an SAP S/4HANA system, and we want to replicate the following CDS views:

Z_CDS_EPM_BUPA

Z_CDS_EPM_PD

Z_CDS_EPM_SO

As Load Type we selected: Initial and Delta

Source system and selected CDS views

In the next step we define our pre-defined connection to Google BigQuery as target system.

Target system BigQuery

Afterwards we need to select the target dataset by navigating to the container selection.

Select target dataset

In this example we choose the dataset GOOGLE_DEMO that already exists in BigQuery.

Note: In Google Big Query language the container selection in your Replication Flow corresponds to the datasets that are available in your Google BigQuery system. Therefore, will use the terminology dataset in the upcoming paragraphs when we talk about the target container in this sample scenario.

Target dataset

The target dataset GOOGLE_DEMO is set and now we can use the basic filter and mapping functionality you know from Replication Flow.

Let us have a quick look at the default settings by navigating to the projection screen.

Navigate to Projections

After navigating to the Mapping tab, you will see the derived structure which can be adjusted.

Structure Mapping

In addition, you will see also three fields that cannot be changed:

operation_flag: indicates the executed operation on the source record (insert, update, delete etc.).

recordstamp: timestamp when the change happened.

Is_deleted: indicated if a record was deleted in the source,

These three fields will be automatically created in the target structure and will be filled by the systems and can be used depending on the information you require in a certain use case.

Beside the standard adjustments that can be made to structures, there are some special target settings that can be made after navigating to the settings icon back on the main screen.

BigQuery target settings

The Write Mode is in this mode by default on Append. In this release Append API from Google BigQuery is used. Further APIs will be considered depending on the availability.

Depending on the length, Decimals can be clamped by activating the Clamp Decimals setting. This can also be activated for all objects in the Replication Flow.

You find a comprehensive explanation in our product documentation: help.sap.com

Deploy and run data replication

As the next step the Replication Flow can be deployed and afterwards we will start it.

Run Replication Flow

This will start the replication process which can be monitored in the monitoring environment of SAP Datasphere. This was illustrated in our first blog, so we will directly jump to the BigQuery environment and have a look at the moved data.

Big Query Explorer using Google Cloud Console

After navigating to our dataset GOOGLE_DEMO we find our automatically created tables and select Z_CDS_EPM_BUPA to have a look at the structure.

Dataset structure in BigQuery

The data can be displayed via selecting Preview.

Preview replicated data

Within this blog you got all insights into the new premium outbound integration functionality offered by SAP Datasphere as the recommended way to move data out of the SAP environment.

Extending the connectivity to object stores and BigQuery will give you significantly new opportunities.

You can also find some more information about the usage of Google BigQuery in Replication Flows in our product documentation: help.sap.com

Always check our official roadmap for planned connectivity and functionality: SAP Datasphere Roadmap Explorer

Thanks to my co-author daniel.ingenhaag and the rest of the SAP Datasphere product team.

00022111734 · ‎11-19-2023

Thanks for insightful blog

BenedictV · ‎11-20-2023

Hi @Tobias Koebler, Does the “premium outbound integration” have a separate cost associated with it?

bala_ram2 · ‎11-20-2023

Excellent. This is how the technical blogs should be written

tobias_koebler · ‎11-21-2023

Thanks!

tobias_koebler · ‎11-21-2023

Appreciated 🙂

tobias_koebler · ‎11-21-2023

Hi, there is some additional charge for it, as it requires compute and will add value. Please contact your SAP account counterpart for more details.

I do not have the figures and would like to keep the blog more on a technical level:)

Best, Tobi

callanloberg23 · ‎11-26-2023

Thanks Tobias & Dan!

Snowflake and Databricks are not explicitly called out as supported target systems. Can you specify what a replication flow may look like in a scenario where you just use SAP Datasphere as the integration layer (i.e., no data storage) to one of those two product? I think this is particularly relevant for Databricks since it is a strategic partner product of SAP Datasphere.

Secondly, can you speak to what semantic data is available to be carried over to these non-SAP targets in the premium outbound integration scenario?

Thanks, and great blog series!

Callan

ashishl · ‎12-01-2023

Excellent, much awaited and game changing feature with additional cost.

it's hard to justify extra capacity units for premium outbound introduced with NonSAP target as Client is already paying for storage and compute engine. If more memory needed for transfeting data then it will automatically required increase in compute engine capacity. Hopefully SAP will re-think about it.

Thank you.

Ashish

nagendra_hamsala

hi Tobias,

Thanks for the informative blog.

QUESTION: When we are able to extract data from S/4 HANA to AWS or Azure using Glue or ADF through Odata services, why should I use this premium outbound integration through Replication flows in SAP Datasphere?

Cheers,

Nag

Replication Flow Blog Series Part 2 – Premium Outbound Integration

Introduction premium outbound integration

Integration with SAP BigQuery

Specify Replication Flow

Deploy and run data replication

Get Your SAP HANA Idea Incubator Badge Today!

SCN Mission - SAP HANA Quiz Challenge is now retired

Share your #HANAStory and Win