Making Data Services jobs more resilient

werner_daehn · ‎05-11-2022

SAP Data Services is an ETL tool to Extract raw data, Transform it into business meaning and Load the result into a target for reporting, Business Intelligence tools, Data Migration or any other generic data integration requirement. Building such dataflows and orchestrate them into workflows is the daily task of Data Services users.

A data integration project requires 100s of dataflows with complex business logic. It is easy to forget something. Data Services just got the needed features to increase the project resilience, although by a SAP partner.

When I was product manager for Data Services we added some basic functionality in that regards, but not as much as I liked. The main issue is that getting the data is tougher than it looks, because the majority is stored as text in the repository, in the so called ATL format.

Example: The ATL description for a flat file looks like this and is stored as a free form text in a database table. Finding all files with a certain setting requires to interpret the text.

Recently I learned that there is a product called 360Eyes for Data Services which does what I envisioned for Data Services plus more.

360Eyes reads the repository, does parse the ATL language and stores the extracted data in database tables ready to be consumed. This has multiple advantages:

The data can be queried and visualized with any reporting tool, including the SAP BI tools.

All kinds of sanity checks can be built using the reporting tool.

Because the data is never overwritten, comparisons can be made to figure out what changed.

Regulatory business requirements can be validated as another type of check.

This boosts the quality of the Data Services projects, increasing the developer efficiency and the quality of the data integration data flows, as I can attest from my own customer projects.

Another way to look at it is using it as part of the Unit and Integration tests, which are common in professional software development today. To get an idea about the savings, a few examples from past projects I was involved in.

User story – Initial load

A go-live is a stressful project phase, in a data integration project even more so because the first task is to perform the initial load, e.g. taking the entire SAP material master data with millions of rows, converting each to the target data model and loading it into the target system. It has been tested over and over again and this weekend it must succeed. And it does with a few minor hiccups. Monday morning people complain that some materials exist twice for an unknown reason. Further analysis shows that all dataflows related to an initial load truncate the target – except one. There the flag had been forgotten.

The sanity check “Select all dataflows with the name DF_*_INITIAL where the table loader is not set to truncate-table” would have uncovered that issue. But there is no such functionality in Data Services and checking these settings manually takes a long time. The 360Eyes database contains that information readily. One query and we know the problematic data flows.

User story – Version control

Not all Data Services implementations use the versioning functionality provided by central repositories and check-out/check-in processes. This is particularly true in development environments, where changes are frequent. The dataflow comparison functionality allows users to quickly compare two different versions and see what’s changed.

User story – Performance degradation

Once I got a call from IT, the delta load job took so long, it impacted other SAP ERP tasks scheduled at that time. The delta load used to take one hour when it was developed, now it runs for three hours! A chart visualizing the delta-run times for every single day would help to uncover the root cause.

Did it happen just by accident, e.g. a slow database?

Did the delta load get slower gradually because the data volume increased?

Was a particular day the run time jumped from one hour to three hours?

Thanks to the data made available in the 360Eyes database, such a chart can be created easily.

It clearly shows that something changed on one particular day. With this information the 360eyes database can be queried to find out what code change was made.

With Data Services only, I would need to compare the runtimes manually for every single job execution, write down the execution times of each dataflow before and after and then investigate the dataflow details.

Summary

These are just examples showing the potential of opening up the SAP Data Services internal data. Actually I find the approach quite clever. While SAP did incorporate visualizations into the SAP Data Services management dashboard, which might or might not suite the needs, the approach here is to make the data available in a regular database.

As such, the user can either utilize the provided reports or build additional visualizations based on the actual needs. The user is no longer limited to the prebuilt visualizations but can instead harness the full power of the BI tools used in house today.

The data available does range from the individual Data Services objects and their settings to operational statistics and impact/lineage info. All of that with the complete history to answer questions like “what changed?” or “is there a trend?”.

I am looking forward to see what else customers will come up, now that the data is finally available.