How to scale a single SLT pipeline to extract data...

former_member255270 · ‎11-14-2020

Here is a tedious problem: You have a pipeline that can extract data from a single table and write it as a file into an object store. But you would also like to run this same pipeline against hundreds of other tables. It would be extremely cumbersome to re-create the same pipeline for each table you wanted to extract.

In this blog post I will use the concept of parameterizing a process execution -- which is documented here -- and apply it to my single SLT replication pipeline template. This will effectively scale the pipeline and allow it to run simultaneously against a large number of different tables. I use the SLT connector operator as an example but the same methodology of using ${variables} can be applied to any other parameterized operator in SAP Data Intelligence.

Prerequisites: In this blog post I already have preconfigured my SLT replication server, created an ABAP connection in Connection Manager and modeled a fully working pipeline that can read from this ABAP connection and write a single table to some object store such as S3.

Step 1: Declare variables in the relevant operators(s)

Enter the variable name ${tableName} in the Table Name field of the SLT Connector

Enter the same variable name ${tableName} into the target path of your object store. This allows each table replication to have its own unique file in the target file object store.

At this point you can already execute the pipeline with an input parameter from the Run As menu. However, this still requires that you manually enter each table name as an input parameter. In the next step I will share a pipeline that can programmatically enter hundreds of table names into the Run As prompt.

Step 2: Create batch job execution pipeline

Create a new graph using the + button in the upper right corner. Then, switch to the JSON representation of this new graph and copy paste the JSON code found here.

Save the graph.

This pipeline uses a custom python3 operator to parse the input configuration parameters and uses the openAPI operator to call the Pipeline Modeler REST API to start the template graph from step 1.

Step 3: Enter configuration parameters

The graphName configuration parameter must be the exact name of the graph template that was modified in step 1. The tableNames must be separated by semicolon and contain no spaces. Last table name in the list does not need to be followed by a semicolon e.g. MARA;KNA1

In the configuration screen of the “Start Graphs” (OpenAPI) operator, enter the hostname of your Data Intelligence instance (found at the top of your browser) as well as the login credentials used to log on. Note that username must have the tenant name followed by a backslash as a prefix, e.g. “<tenant>\<username>”

Note that this will store your SAP Data Intelligence credentials in plain text! To avoid this create a corresponding openAPI connection in the Connection Manager instead.

Step 4: Execute the batch pipeline

One by one, the batch pipeline will start a new pipeline for each table listed in the ListTables operator. Note that unless your pipeline from step 1 uses a graph terminator operator it will not automatically stop and must be manually stopped from the the Status tab.