SAP Data intelligence - Extracting data from an EC...

raphael_walter · ‎02-14-2023

Hello everyone,

Again another request from one of my customer, actually from several of my customers. How can you extract data from an ECC system or an S/4HANA system into Google Big Query with delta load? As you’ll see, this is very simple and straightforward using SAP Data Intelligence, you can perform Intial load, delta loads or replication(Intial Load and Delta loads). For one customer, the target was actuallly Google Cloud Storage, in this case it is even easier and you can rely on the SAP Data Intelligence RMS feature and directly connect your source to the target with replication.

In any case, enough talking, look at the following video and I will demonstrate how you can quickly this set up. Of course, you will need acces to an ECC or S/4HANA system with SLT or the DMIS addon installed and a Google Cloud account for Google Big Query and Google Cloud Storage(needed for staging the data). Please note that I will not go in the details of the configuration of the SLT. I'd like to take the time to give credit to abdelmajid.bekhti, he was the one who showed me how to do this and provided me with all his systems and gave me access. I wouldn't be able to show you anything without his help and knowledge.

Ok, now let's get started! 🙂

First you need to configure your connections to your systems in SAP Data intelligence. We need to configure the SLT, ECC (or S4/HANA), Google Cloud Storage and Google Big query connections. For that click on the connections tab.

Let's start with the SLT connection.

As you can see here, we're using the SAP Cloud Connector to connect to our On Premise SLT system.

Then let's have a look at our connection to Google Big Query.

And we also need to create a connection to our Google Cloud Storage (SAP Data Intelligence will Google Cloud Storage as a staging table before delta loading into Google Big Query).

Once all these connections are correctly set up. We are going to have a look at the SLT configuration. We are going to connect to our SLT box and launch transaction LTRC (SAP LT Replication Server Cockpit).

As you can see we are replicating table MARA from our ECC system. And this is done in real time.

Once again, I will not through the details of setting up SLT. But everything is done correctly and we can proceed. As you saw, in this demo, we are going to replicate table MARA. The next step is to create a target table in Google Big query, for this you need to extract the data structure of the table you wish to replicate from your ERP system. You can refer to this blog to see how to extract this table structure. Downloading Data Dictionary Structure into local file

In my case, I used DD03VT in SE11 to extract table MARA. I didn't make exactly the same data structure with the exact types, and just used strings and had to modify some of the names as I could not create fields with '/' in their names in Google Big Query tables.

Now let's look at the pipeline. As you will see, it is incredibly simple... We are using Generation 2 operators, with Generation 1, the data pipeline would be more complex. Just to explain, SAP Data Intelligence Generation 2 operators are based on Python 3.9 whereas Generation 1 operators using Python 3.6.

We are using the "Read Data from SAP System" operator to connect to SLT and the Cloud Table Producer operator to connect to Google Big Query (this operator can connect to Google Big Query or Snowflake).

Let's look at the configuration of our Read Data from SAP System operator.

Choose the correct connection, in our case the SLT connection shown earlier. And then click on object Name to select the Mass Transfer ID and table you want to replicate.

Then you need to choose the replication mode, either Initial Load, Delta Load or Replication (both initial and delta loads)

In our case, we chose Replication. Now to the Cloud table Producer Operator. Click on the configuration.

Click on the Target.

Now we need to configure all the information here. For the service, this is a Google Big Query (other option is snowflake). For the connection ID, choose your Google Big Query Connection, for the target the table that you created in your Google Big Query System.

For the staging connection ID, choose your Google Cloud Storage connection. For the staging Path, simply choose where the staging table will be created.

In the target columns, you can perform an autocompletion of the mapping by clicking on the button indicated below. For fields that have different names, you have to do the mapping by hand.

Now we need to save and start our data pipeline. Click on "Run as"

Give a name to your graph, click on Capture snapshot in order to have snapshot every x second in case of a problem. You can then choose to have an automatic recovery or manual recovery. Click on OK and launch the graph.

Your pipeline should be up and running in a minute or so.

Now let's go to our ECC system and launch transaction MM02 to change our material numbers. I'll change material number 1177.

Select both Basic Data 1 & 2 and continue.

Now we're going to modify the old material number.

I'll change it to 1216. Let's save and head to the Google Big Query.

The Old material Number was modified and pushed to Google Big Query.

As you can see, this was very easy to set up. Once again, I have done this with a ECC system but this could also be done with an S4/HANA system just as easily. Again, thank you abdelmajid.bekhti for your systems and your help in configuring this.

I hope this blog was useful to you. Don’t hesitate to drop a message if you have a question.

samsam2022 · ‎02-14-2023

Thanks Raphaël and Abdelmajid for this amazing work. That's just perfect!

MustafaBensan · ‎02-15-2023

Hi Raphael,

This is a good example of direct table replication. For an analytics/data warehouse use case though it means with multiple tables the relationships and business logic would need to be modelled from scratch.

As a suggestion for a follow-up post, what would be even more beneficial in this scenario would be to show how to replicate data to Google BigQuery using the ODP extractors already built into ECC and S/4HANA, again using SLT and Data Intelligence as you have done here. This way, we can take advantage of the "analytics-ready" data structures and business logic already provided by the ODP extractors (as used by standard business content for SAP BW/4HANA and presumably SAP Data Warehouse Cloud).

Regards,

Mustafa.

Cocquerel · ‎02-15-2023

Google is providing an SLT plug-in to replicate directly from SLT to bigQuery ( see https://cloud.google.com/data-fusion/docs/how-to/use-sap-slt-plugin )
What are pros/cons of this solution comparing to going via SAP Data Intelligence as you described in this blog ?

lsubatin · ‎02-15-2023

Hi Mustafa,

If you are looking to not model from scratch, you can check out the Cortex Framework.

The benefit of table-by-table replication vs ODP or CDS-based replication is the footprint in the landing is smaller (ODP will replicate the same piece of information multiple times) and you can get eventual consistency.

With pre-aggregated data and ODP, you need to do some legwork in terms of future proofing and compatibility, as extractors and mechanisms tend to change, get customized or get deprecated across versions (tables remain much more stable in contrast).

Another big issue is that consistency at business process level is much harder to achieve with the same entities replicated multiple times but at different points in time (e.g., when you have a sales order extractor and a delivery extractor flowing data separately, there are chances that the statuses between the two extractors are out of sync, this is easier to manage with tables). SLT on the other side can even allow for near real-time consistency (there's a direct connector created by Google that can do this too).

Last but not least, one of the big use cases for using BigQuery to complement SAP data is heavy machine learning processing and joining with other data, so having a lower level of granularity and the ability to time travel and take snapshots as known to be beneficial.

@raphael.walter, sorry to hijack the comment section of your amazing blog post. I was looking for something like this and you popped in my feed. Thanks for all the details, this is very helpful stuff!

Thanks!

Lucia

raphael_walter · ‎02-21-2023

Hello Mustafa,

Thank you for your comment. You can actually use directly SAP Data Intelligence with ODP extractors, no need of SLT. For that you can create a Data Pipeline in the modeler just like in this blog, and use and the Read from SAP Operator directly connecting to an ECC or S/4HANA System.

The other way is to use the RMS (Replication Management Service) that I'm showing in the vidéo at 9:12, you can't use it to target a Google Big Query, but you could target a Google Cloud Storage. 🙂

Hope this helpful, let me know if you need more informations.

Best regards,

Raphaël

raphael_walter · ‎02-21-2023

Thank you Lucia for your comment. No problem, this is a place of exchange, I'm glad that this demonstration was also useful for you. 🙂

Best regards,

Raphaël

raphael_walter · ‎02-21-2023

Hello Michael,

Thank you for your comment. I will not compare the solution of a partner like Google to SAP Data intelligence. 🙂 The only thing I will say is that in terms of licensing, you need to have an SLT full enterprise to extract data to a third party solution like the one from Google, whereas with SAP Data Intelligence, you have an runtime SLT license allowing you to extract the data.

Also in that case, you have the flexibility to use all the features of SAP Data inteligence, not only of Data Orchestration, but also of Data Cataloguing and Data Science. 😉

Best regards,

Raphaël

armaansingla1992 · ‎02-23-2023

Hi Raphael,

How can we extract the data with ODP extractors directly without BW system? Is there any option to directly use the ABAP ODP CDS to use in Generation 2 for extraction the data ?

Thanks for all the details.

Regards,

Arnaan

Tommaso · ‎03-06-2023

Hello Raphael,

Just to be sure: does this means that with SAP Data Intelligence a runtime SLT license is included at no additional cost, correct?

Best regards,

Tommaso

raphael_walter · ‎03-23-2023

Hello Tommaso,

Yes, Data Intelligence includes a runtime license of SAP Landscape Transformation Replication server, as long as it is used only for exchanging data with SAP Data Intelligence as mandated by the SUR (SAP Software Use Rights)

Best regards,

Raphaël

raphael_walter · ‎03-23-2023

Hello Arnaan,

Please refer to this blog from daniel.ingenhaag to answer your question.

If it is not clear, do not hesitate to ask again here or on his blog.

Best regards,

Raphael

SAP Data intelligence - Extracting data from an ECC or S/4HANA system to Google Big Query with SLT

Get Your SAP HANA Idea Incubator Badge Today!

SCN Mission - SAP HANA Quiz Challenge is now retired

Share your #HANAStory and Win