SAP Data Intelligence | How to get headers of CDS ...

former_member746 · ‎02-15-2021

In my previous post I promised you to show the combination of ABAP CDS Reader and Python operator. Here we are. You can use the approach below for SLT and ODP operators too, because the JSON output of operators is the same. In this blog post we are going to extract a standard CDS view from SAP S/4HANA Cloud into SAP Data Intelligence Data Lake. And what the most important is - we want to see headers in the file. This topic is more complicated, than the first one, but I’m sure you will get it 😊 Why I picked this topic? Firstly, I know that some colleagues encountered this problem. Secondly, there isn't a standard solution for it at the moment. Moreover, this approach will help you to select columns or make some data preparation if it needed. So, let's go!

For the CDS view extraction from SAP S/4HANA Cloud we are going to use ABAP CDS Reader operator in DI. It is a standard operator using to read data from a CDS view. There is a standard template “CDS View to File” using for data extraction from a CDS view into a csv file (located in the Data Lake in DI):

Problem description:

However, using this pipeline you will not see any headers in the file. It’s bad…very bad ☹

There are three versions of ABAP CDS Reader operator. The main difference here is the output data type. But independent of version you are using, the headers are still not visible in the file. And this problem we are going to solve in this post.

Solution:

Among these three versions only ABAP CDS Reader V2 has an information about fields in its output. So, we are going to use this version. The output data type of this operator is message. A message has a body and attributes. The body of the output deliveries data, and attribute some information about fields, last batch etc (see image below). Because there is not any standard solution to get headers, we must use a custom operator. In this post I’m going to use Custom Python operator. If you don’t know what it is, check my previous blog post.

Below you will see the example how attributes look like (I_COSTCENTERTEXT view):

I’m going to read headers from 'ABAP' -> 'Fields' -> 'Name'. The pipeline looks like:

I’m not going to go through back end setup, we will concentrate us on Data Intelligence. Let’s create our Python operator:

1.Step – Create input and output ports for Python operator

Input port: name – inData, type – message

Output port: name – outString, type – string

2.Step - Add some python magic into the "Script"

from io import StringIO

import csv

import pandas as pd

import json

def on_input(inData):

    # read body

    data = StringIO(inData.body) 

    # read attributes

    var = json.dumps(inData.attributes) 

    result = json.loads(var)

    # from here we start json parsing 

    ABAP = result['ABAP']

    Fields = ABAP['Fields']

    # creating an empty list for headers

    columns = [] 

    for item in Fields:

        columns.append(item['Name'])   

    # data mapping using headers & saving into pandas dataframe

    df = pd.read_csv(data, index_col = False, names = columns) 

    # here you can prepare your data,

    # e.g. columns selection or records filtering

    df_csv = df.to_csv(index = False, header = True)

    api.send('outString', df_csv)

api.set_port_callback('inData', on_input)

Note, that you should take care of data types using pandas in python. I don’t consider it in the code. Moreover, don’t forget about last batch if you need it in your use case. In addition, be aware that the “on_input” function will be executed by each data package. So, this approach is appropriate for replication/delta mode (because in the initial load json of attributes looks different after the last package, parsing cannot be done anymore, and the pipeline will fail after the last package). But this code can be adapted for initial load with only a couple of code lines. Take it as an exercise 😊 You can use this approach, if the CDS’s structure will be changed. And, of course, this code can be optimized.

3.Step - Upload a cool icon for your python operator and save it.

Configure “ABAP CDS Reader V2” and “Write to File” operators, save and run your pipeline. You are almost done 😊

Your output with the headers:

Congratulations! As of today, you don’t have any troubles with CDS's headers 😊 If you have any questions or ideas for the next blog post, feel free to let me know in the comments. Stay tuned!

How to connect SAP S/4HANA with SAP Data Intelligence:

Integrate SAP Data Intelligence and S/4HANA Cloud to utilise CDS Views – Part 1

How To Integrate SAP Data Intelligence and SAP S/4HANA AnyPremise

Ian_Henry · ‎03-22-2021

Great blog Yuliya thank you for sharing,

I have implemented that in my scenario.

There's one error in your pasted code, the line below needs to be indented with an extra tab.

columns.append(item['Name'])

Cheers, Ian.

former_member746 · ‎03-23-2021

Hi Ian,

yes, of course, you are right. Thanks for saying, I will correct it.

Regards,

Yuliya

former_member255270 · ‎03-31-2021

Yuliya, you are on fire with all these super useful DI blog posts!

former_member730080 · ‎10-18-2021

Hey Yuliya, Very Informative BLOG. Thank you for sharing

Can you give me and idea what changes we can make to the code so that it will work for initial load as well?

P.S.:- i have the same structure of CDS View as well.

It will be a great help from you!

Thanks,

Samarth

abhimanyu_sharma · ‎10-31-2021

hi Yuliya,

I have created a graph where I am reading data from BW DSO and then via ODP ABAP reader V2, I am writing the data to CSV file via write file operator in Azure data lake. But the issue I am facing here is that the graph is fetching only the first package of records in the init+delta mode and I am not getting the full records.

Could you please helo me on this ? I have copied your phyton code for header it works fine for me but I am not able to get all the records as you said that we need to handle in the code. Could you please help on this please ?

former_member746 · ‎01-10-2022

Hi Samarth,

for initial load you can use the message.lastBatch information. You can read it from attributes for each package. If message.lastBatch=True the initial load is done.

Regards,

Yuliya

former_member746 · ‎01-10-2022

Hi Abhimanyu Sharma,

the reason is, that you have Terminator after Write File directly. After the first package the Write File sends the event to the Terminator -> the pipeline will be terminated after the first package. For example, if you have an initial load use case, you have to check if the last package is received or not (use message.lastBatch information). If yes -> send an event to the Terminator. So you need to add one more operator between Write File and Terminator for this check.

Regards,

Yuliya

former_member837091 · ‎01-13-2023

nice blog Yuliya,

however i have just started on DI recently and have a requirement to read S4S tables like LFA1, TCURC etc into a table in HANA database.

so i just need to run once initial load.

could you pls provide detailed code for initial load? with your above comment, its very difficult to find out where to add the line or where to modify.

former_member837091 · ‎01-13-2023

hi all,

i am new to DI, can anyone pls share code for initial load,

my requirement is to do inital load and read one time the data of few tables in S4 like LFA1 etc and dump into HANA database.