Replication Flows - SAP Datasphere and Google Big ...

karishma_kapur · ‎11-16-2023

Background

SAP Datasphere released replication flows starting from SAP Datasphere version 2021.03. This new capability allows one to copy multiple tables from a source to a target in a fast and seamless way. For more information on replication flows, please refer here.

This blog will demonstrate how to replication data from SAP sources to Google BigQuery.

Steps

To start, you will need to create a connection in SAP Datasphere to Google Big Query. Please refer to Step 4. “Creating Google BigQuery Connection” in this blog, to make your connection.

Make sure you have a Dataset in Google BigQuery that you want to replicate the tables into.

Make sure you have a source connection. In this case, we will be using S4 Hana Cloud. You will need to create this connection in the Connections tab in SAP Datasphere.

Navigate to SAP Datasphere, and click on Data Builder on the left panel. Find and click the “New Replication Flow” tile.

Click on Select Source Connection

Choose the source connection you want. We will be choosing SAP S4 Hana Cloud

Click select Source Container.

Choose CDS Extraction – CDS Views Enabled for Extraction and then click Select.

Click “add source objects” and choose the views you want to replicate. You can choose multiple if needed. Once you finalize the objects, click add selection.

Now, we select our target connection. We will be choosing Google Big Query as our target. If you experience any errors during this step, please refer to the note at the end of this blog.

Next we choose the target container. Recall the dataset you created in Big Query earlier in step 2. This is the container you will choose here.

In the middle selector, click to”settings” and set your load type. Initial only means to load all selected data once. Initial and delta means that after the initial load, you want the system to check every 60 minutes for any changes (delta), and copy the changes to the target.

Once done, click on the edit projections icon on the top toolbar to set any filters and mapping. For more information on filters and mapping, please refer here, and here.

You also have the ability to change the write settings to your target through the settings icon next to the target connection name and container.

Finally, rename the replication flow to the name of your choosing in the right details panel. Then, save, deploy, run the replication flow through the top toolbar icons. You can monitor the run in the “data integration monitor” tab on the left panel in SAP Datasphere.

When the replication flow is done, you should see the target tables in BigQuery as such. It should be noted that every table will have 3 columns added from the replication flow to allow for delta capturing. These columns are operation_flag, recordstamp, and is_deleted.

Note: You may have to include Premium Outbound Integration block in your tenant to deploy the replication flow.

Conclusion

You have now successfully created a replication flow from SAP S4 HANA Cloud to Google BigQuery. If you have any questions, please leave a comment below. Thank you!

jack_tee2 · ‎11-21-2023

hi karishma_kapur

Great to see the DS has added the hyperscaler connection types. A question to the Replication Flow on the source side, when you run the 4 tables replication. Does it consume (min of) active 4 processors in the origin source system(S4HANA) while this Replication Flow is running?

Thank in advance.

shantanu_a_patil · ‎01-01-2024

Strategically will SAP build a S4HANA Replication connector towards BigQuery directly? If that happens, will Datasphere be still needed ? and why ?