HANA based Transformation (deep dive)

Torsten_ · ‎06-17-2016

2 HANA based Transformation (deep dive)

This blog is part of the blog series HANA based BW Transformation.

Now I will look a little bit behind the curtain and provide some technical background details about SAP HANA Transformations. The information provided here serves only for a better understanding of BW transformations which are pushed down.

As part of the analysis of HANA executed BW transformations we need to distinguish between simple (non-stacked) and stacked data flows. A simple, non-stacked data flow connects two persistent objects with no InfoSource in between, only one BW transformation is involved. We use the term stacked data flow for a data flow with more than one BW transformation and at least one InfoSource in between.

Stability of the generated runtime objects

All information I provide here are background information to get a better understanding on SAP HANA executed BW transformation.

It is important to keep in mind that all object definition could be changed!

Do not implement any stuff based on the generated objects!

The

structure of a CalculationScenario (view names, number of views, …) or
generated SQL statements or
PLACEHOLDER

could be changed by the next release, support package or a SAP note.

2.1 Simple data flow (Non-Stacked Data Flow)

A simple data flow is a data flow which connects two persistent BW objects with no InfoSource in between. The corresponding Data Transfer Process (DTP) processes only one BW transformation.

In case of a non-stacked data flow the DTP reuses the SAP HANA Transformation (SAP HANA Analysis Process) of the BW transformation, see Figure 2.1.

Figure 2.1: Non-Stacked Transformation

2.2 Stacked Data Flow

A stacked data flow connects two persistent data objects with at least one InfoSource in between. Therefore a stacked data flow contains at least two BW transformations. The corresponding Data Transfer Process (DTP) processes all involved BW transformations.

In case of a stacked data flow, the DTP cannot use the SAP HANA Transformation (SAP HANA Analysis Process) of the BW transformations. Strictly speaking, it is not possible to create a CalcScenario for a BW transformation with an InfoSource as source object. An InfoSource cannot be used as a data source in a CalculationScenario.

Figure 2.2: Stacked Transformation

Figure 2.2 shows a stacked data flow with two BW transformations (1) and (2) and the corresponding SAP HANA Transformations (3) and (4). There is no tab for the CalculationScenario in the SAP HANA Transformation (3) for the BW transformation (1) with an InfoSource as source object.

Therefore the DTP generated its own SAP HANA Transformations (6) and (7) for each BW transformation. The SAP HANA Transformations for the DTP are largely equivalent to the SAP HANA Transformation for the BW transformations (3) and (4).

In the sample data flow above, the SAP HANA Transformation (5) for the DTP get its own technical ID TR_5I3Y6060H25LXFS0O67VSCIF8. The technical ID is based on the technical DTP ID and the prefix DTP_ would be replaced by the prefix TR_.

The SAP HANA Transformation (5) and (6) is only a single object, in fact. I included it twice in the picture to illustrate that the SAP HANA Transformation (6) is based on the definition of the BW transformation (1) and will be used (5) from the DTP.

Figure 2.3 provides a more detailed view of the generated SAP HANA Transformation. The SAP HANA Transformation (6) is based on the definition of the BW transformation (1) and is therefore largely identical to the SAP HANA Transformation (3). (3) and (6) differs only in the source object. The SAP HANA transformation (6) uses the SAP HANA Transformation / SAP HANA Analysis Process (7) as the source object instead of the InfoSource as shown in (3). The SAP HANA Transformations (4) and (7) are also quite similar. They only differ with respect to the target object. In the SAP HANA Transformation (3), the InfoSource is used as the target object. The SAP HANA transformation (7) does not have an explicit target object. Its target object is only marked as Embedded in a Data Transfer Process. That means the SAP HANA Transformation (7) is used as data source in a SAP HANA Transformation, in our case in the SAP HANA Transformation (6).

Figure 2.3: Stacked Transformation (detailed)

The technical ID of the embedded SAP HANA Transformation (7) is based on the technical ID of the SAP HANA Transformation (5) and (6) of the DTP. Only the last digit 1 will be added as a counter for the level. This means that in case of a stacked data flow with more than two BW transformations, the next SAP HANA Transformation would get the additional digit 2 instead of 1 and so on.

Later on we will need the technical IDs to analyze a HANA based BW transformation, therefore it is helpful to understand how they are being created.

2.3 CalculationScenario

To analyze a SAP HANA based BW transformation, it is necessary to understand the primary SAP HANA runtime object, the CalculationScenario (CalcScenario). The BW Workbench shows the CalcScenario in an XML representation, see (2) in Figure 2.4. The CalculationScenario is part of the corresponding SAP HANA Transformation (Extra è Generated Program HANA Transformation) of the DTP. The CalculationScenario tab is only visible if the Expert Mode (Extras è Expert Mode on/off) is switched on. Keep in mind, if the source object of the BW transformation is an InfoSource the CalculationScenario could only been reached by using the DTP Meta data, see »Dataflow with more than one BW transformation« and »Stacked Data Flow«.

The naming convention for the CalculationScenario is:

/1BCAMDP/0BW:DAP:<Technical ID – SAP HANA Transformation>

The CalculationScenario shown in the CalculationScenario tab, see (2) and (1) in Figure 2.4, is only a local temporary version, therefore the additional postfix .TMP.

The CalculationScenario processes the transformation logic by using different views (CalculationViews) to split the complex logic into more simplified single steps. One CalculationView, the default CalculationView, represents the CalculationScenario itself. The default CalculationView uses one or more other CalculationViews as source and so on, see Figure 2.5. A CalculationView cannot be used in SQL statements, therefore for each CalculationView a ColumnView will be generated.

Inside the SAP HANA database, the ColumnViews are created in the SAP<SID> schema, see (1). Each ColumnView represents a CalculationView, see paragraph 2.3.2.1 »CalculationScenario - calculationViews«, within a CalculationScenario. The create statement of each ColumnView provides two objects. First the CalculationScenario and as second object the ColumnView based on the CalculationScenario. All ColumnViews that belong to a SAP HANA transformation are based on the same CalculationScenario.

Figure 2.4: CalculationScenario and ColumnViews

SAP HANA internally uses the JSON notation to represent a CalculationScenario. Figure 2.5 shows the CalculationScenario depicted in Figure 2.4 (2) in a JSON Analyzer. The tree representation provides a good overview how the different CalculationViews are consumed. The JSON based CalcScenario definition can be found in the Create Statement tab in the column view definition in the SAP HANA studio. The definition can be found in the USING clause of the CREATE CALCULATION SCENARIO statement, the definition starts with ‘[‘ and ends with ‘]’.

Figure 2.5: CalculationScenario in a JSON Analyzer

The SAP HANA studio also provides a good tool to visualize a CalculationScenario, see Figure 2.6. To open the visualization tool click on Visualize View in the context menu of a ColumnView based on a CalculationScenario. The visualization view is divided in three parts. The first part (1) provides a list of the CalculationViews which are used inside the CalculationScenario. Depending on each view definition there are more information about variables, filter or attributes are below the view node available. The second part (2) provides an overview about the view dependencies. Which view consumes which view? The third part (3) provides context sensitive information for a view form the second part.

Figure 2.6: CalculationScenario visualization in the SAP HANA Studio

Now we will have a deeper look into the following sub nodes of the calculationScenario node (see (2) in Figure 2.4):

dataSources
variables
calculationViews

2.3.1 CalculationScenario - dataSources

The node dataSources lists all available data source objects of the CalculationScenario. The following data sources are used within a CalculationScenario in the context of a BW transformation:

tableDataSource
olapDataSource
calcScenarioDataSource

In the first sample transformation, we only use a database table (tableDataSource) as the source object, see Figure 2.7. The sample data flow reads from a DataSource (RSDS), therefore the corresponding PSA table is defined as tableDataSource. To resolve the request ID, the SID table /BI0/SREQUID is also added to the list of data sources.

Figure 2.7: CalculationScenario – Node: TableDataSource

In the second sample a transformation rule Master data read is being used in the BW transformation. In this case an olapDataSource is added to the list of data sources. The olapDataSourceuses the logical index (0BW:BIA:0MATERIAL_F4) from the InfoObject to read the required master data from the real source tables, see Figure 2.8.

Figure 2.8: CalculationScenario – Node: OLAPDataSource

To read the master data in the requested language, object version and time the PLACEHOLDERS

keydate,
objvers
langu

are added.

The third sample is a stacked data flow. In a stacked data flow the CalculationScenario from the DTP uses another CalculationScenario as data source. In these cases, the calcScenarioDataSource is used. The variables defined in the upper CalculationScenario are passed to the underlying CalculationScenario to be able to push down these variables (filters) where possible to the source objects, see Figure 2.9.

Figure 2.9: CalculationScenario – Node: CalcScenarioDataSource

The values for the PLACEHOLDERS are passed in the SQL statement by using the variables, see paragraph 2.3.2 »CalculationScenario - variables«.

The variables for the PLACEHOLDER for the variables keydate, objvers and langu are always set in the INSERT AS SELECT statement, whether they are used or not.

2.3.2 CalculationScenario - variables

In the node variables, all parameters are defined which are used in the CalculationScenario and can be used in the SQL statement to filter the result, see Figure 2.10.

Figure 2.10: CalculationScenario – Node: variables

Placeholder usage by customer
All variables and placeholders defined in a CalculationScenario in the context of a BW transformation are only intended for SAP internal usage. Variables and placeholder names are not stable, that means they can be changed, replaced or removed.

Figure 2.11 provides a further sample, based on a BW transformation, with several dataSource definitions. A variable is been used to control which dataSource and at the end from which table the data are read by the SQL statement.

The sample data flow for this CalculationScenario reads from an advanced DSO (ADSO) (based on a Data Propagation Layer - Template) with three possible source tables (Inbound Table, Change Log and Active Data). For each source table (dataSource), at least one view is generated into the CalculationScenario and all three views are combined by a union operation, see (2).

The input nodes are used to enhance all three structures by a new constant field named BW_HAP__________ADSO_TABLE_TYPE. The constant values

Inbound Table (AQ),
Change Log (CL) and
Active Data (AT)

can later be used as values for the filter $$ADSO_TABLE_TYPE$$, see (3). The filter value is handed over by the SELECT statement and depends, for example, on the DTP settings (Read from Active Table or Read from Change Log). To read data only from the active data (AT) table the following placeholder setting is used:

'PLACEHOLDER'=('$$adso_table_type$$', '( ("BW_HAP__________ADSO_TABLE_TYPE"=''AT'' ) )'),

For further information see 2.4 »SQL Statement«.

Figure 2.11: CalculationScenario – DataSource and Variable collaboration

2.3.2.1 CalculationScenario - calculationViews

The next relevant node type is the node calculationView. A CalculationScenario uses several layered views (calculationView) to transfer the logic given by the BW transformation. For the CalculationScenario and for each calculationView, a ColumnView is created, see (1) in Figure 2.4. The CalculationScenario related to a column view can be found in the definition of each column view.

There are several view types which can be used as sub node of a calculationView:

projection
union
join
aggregation
unitConversion
verticalUnion
rownum
functionCall
datatypeConversion
…

The view types as well as the number of views used in a CalculationScenario depend on the logic defined in the BW transformation and the BW / SAP HANA release.

A SELECT on a CalculationScenario (ColumnView) always reads from the default view. There is only one default view allowed. The default view can been identified by checking whether the attribute defaultViewFlag is set to “true”. In the JSON representation in Figure 2.5, the default view is always shown as the top node.

The processing logic of each view is described in further sub nodes. The most important sub nodes are:

viewAttributesattributes
inputs
filter

The used sub nodes of a CalculationView depend on the view type, on the logic defined in the BW transformation, and the BW / SAP HANA release.

CalculationScenario - calculationViews – view - viewAttributes / attributes

The nodes viewAttributes and attributes are used to define the target structure of a calculation view. The node attributes is used for more complex field definitions like data type mapping and calculated attributes (calculatedAttributes ).

The InfoObject TK_MAT is defined as CHAR(18) with a conversion routine ALPHA. To ensure that all values comply with the APLHA conversion rules, the CalculationScenario creates an additional field TK_MAT$TMP as a calculatedAttribute. Figure 2.12 shows the definition of the new field. The APLHA conversion rule logic is implemented as a formula based on the original field TK_MAT.

Figure 2.12: CalculationScenario - CalculationAttributes

Calculated attributes are memory-intensive and we try to avoid them where possible. But there are some scenarios where calculated attributes must be used. For example, in case of target fields based on InfoObjects with conversion routines (see Figure 2.12) and in case of “non-trustworthy” sources. A “non-trustworthy” data source is a field-based source object (that is not based on InfoObjects), for example a DataSource (RSDS) or a field based advanced DataStore-Object (ADSO). In case of “non-trustworthy” data sources, the CalculationScenario must ensure that NULL values are converted to the correct type-related initial values.

CalculationScenario - calculationViews – view – inputs

A calculation view can read data from one or more sources (inputs). CalculationViews and/or data sources could be used as sources. They are listed under the node input. A projection or an aggregation, for example, typically have one input node, and a union or a join typically have more than one input node.

The input node in combination with the mapping and viewAttribute nodes can be used to define new columns. In Figure 2.13 the attribute #SOURCE#.1.0REQTSN is defined as a new field based on the source field 0REQTSN.

Figure 2.13: CalculationScenario - input

CalculationScenario - calculationViews – view – filter

The filter node, see (2) in Figure 2.14, is used in combination with the variable, see (1) in Figure 2.14, to provide the option to filter the result set of the union-view (TK_SOI). The filter value is set as placeholder in the SQL statement see (3) in Figure 2.14

Figure 2.14: CalculationScenario - input

I will come back to the different views later on in the analyzing and debugging section.

2.4 SQL Statement

Running a DTP triggers an INSERT AS SELECT statement that reads data from the source and directly inserts the transformed data into the target object. There are two kinds of filter options available to reduce the processed data volume: the WHERE condition in the SQL statement and the Calculation Engine PLACEHOLDER. Placeholders are used to set values for variables / filters which are defined in the CalculationScenario. Which placeholders are used depends on the logic defined in the BW transformation and the BW / SAP HANA release.

PLACEHOLDER

The following table describes the most important PLACEHOLDERS which can be embedded in a CalculationScenario and used in the SQL statements. Which PLACEHOLDERis been used in the CalculationScenario depends on the implemented dataflow logic.

Important note about the PLACEHOLDER

PLACEHOLDER are not stable and the definition can be changed by a new release, support package or note!

It is not supported to use the here listed PLACEHOLDER inside a SQL Script or any other embedded database object in a context of a BW transformation. This also applies to all PLACEHOLDER used in the generated SQL statement.

Placeholder name	Description
$$client$$	Client value to read client dependent data. The placeholder is always set, whether it is used or not.
$$change_log_filter$$	This placeholder is only used to filter the data which is read from the change log. The filter values are equivalent to the values of the placeholder $$filter$$. The placeholder is only used for advanced DataStore-Objects (ADSO). See also $$inbound_filter$$ and $$nls_filter$$.
$$change_log_extraction$$	Wenn die Extraktion aus dem Changelog erfolgt ist der Wert ‘X’ gesetzt. Wird im Rahmen des Errorhandlings genutzt.
$$datasource_src_type$$	The placeholder is set to ‚X‘ in case the data would be read from the remote source object and not from the PSA.
$$dso_table_type$$	This placeholder is used to control which table of a standard DataStore-Object (ODSO) is used as data source. Therefore the field BW_HAP__________DSO_TABLE_TYPE can be set to: Active Table (0) Change Log (3)
$$filter$$	This placeholder is used to filter the source data where possible. That means the placeholder is typically used in the next view above the union for all available source tables. This placeholder contains the filters defined in the DTP extraction tab plus some technical filters based on the REQUEST or the DATAPAKID. In case of using an ABAP routine or a BEx-Variable in the DTP filter, the result of both are used to create the filter condition. The placeholder $$filter$$ is used for all BW objects except advanced DataStore-Objects (ADSO). To filter an ADSO see $$inbound_filter$$, $$change_log_filter$$ and $$nls_filter$$.
$$inbound_filter$$	This placeholder is only used to filter the data which is read from the inbound queue[KT1] . The filter values are equivalent to the values of the placeholder $$filter$$. This placeholder is only used for advanced DataStore-Objects (ADSO). See also $$change_log_filter$$ and $$nls_filter$$.
$$changelog_filter$$	This placeholder is only used to filter the data which is read from the change log. The filter values are equivalent to the values of the placeholder $$filter$$. This placeholder is only used for advanced DataStore-Objects (ADSO). See also $$change_log_filter$$ and $$nls_filter$$.
$$keydate$$	Date to read time dependent master data. The variable value is applied to the logical index of an InfoObject in an olapDataSource. The variable is always set, whether it is used or not.
$$langu$$	Language to read master data. The variable value is applied to the logical index of an InfoObject in an olapDataSource. The placeholder is always set, whether it is used or not.
$$navigational_attribute_filter$$	This placeholder lists the filters based on navigation attributes, which are used in the DTP filter.
$$objvers$$	Object version to read master data. The variable value is applied to the logical index of an InfoObject in an olapDataSource. The variable is always set, whether it is used or not.
$$runid$$	For some features, it is necessary to store metadata in a temporary table. In a prepare phase, the metadata is inserted into this temporary table identified by a unique id (runid). During runtime, the values are then read by using the runid of this placeholder. This placeholder is primarily used in a CalcScenario based on an explicitly created SAP HANA Analysis Process (see transaction RSDHAAP) and not a SAP HANA Transformation.
$$target_filter$$	The target filter is used in a SQL statement to ensure that only those records of the result set which match this filter condition are inserted into a target object. This placeholder is used if the filter condition is given by the target object, for example by a semantically partitioned Object (SPO). A target filter is applied to the result set of the transformation.
$$datasource_psa_version$$	Relevant version number of the PSA where the current request is located.
$$DATAPAKID$$.DTP	This value is set by the package size parameter maintained in the DTP. More information, especially for dependencies, can be found in the value help (F1) for the DTP package size parameter.
$$REQUEST$$.DTP	This placeholder contains the request ID for the target request. In the simulation mode, the value is always set to the SID value of the request ID DTPR_SIMULATION in the request table /BI0/SREQUID.

The following SQL SELECT statement, see Figure 2.15, belongs to the CalculationScenario as shown in Figure 2.11. The SQL statement passes the value for the variable adso_table_type. The placeholder

'PLACEHOLDER'=('$$adso_table_type$$', '( ("BW_HAP__________ADSO_TABLE_TYPE"=''AT'' ) )'),

sets the field BW_HAP__________ADSO_TABLE_TYPE to ’AT’. In the union definition in Figure 2.11, see (2), the field BW_HAP__________ADSO_TABLE_TYPE is set to the constant value ’AT’ for all rows provided by the active data table. Like this, the placeholder ensures that only data from the active data table is selected.

Figure 2.15: DPT - SELECT statement

Some PLACEHOLDERS get the transformation-ID as a prefix to ensure that these PLACEHOLDER identifiers are unique, see placeholder $$runid$$ in Figure 2.15 and the calcScenarioDataSource description in the blog HANA based BW Transformation

Tjarko · ‎10-18-2016

Dear Torsten, I never thought we could deep dive so far into the HANA logic. And it seems for us (Customer) there is a lot to learn. Great Blog. Thanks Tjarko

former_member102219 · ‎11-15-2016

Thank you for this great info Torsten. May I ask you to please shed some light on data packaging in HANA-based transformations. I see the SELECT-statement you provided doesn't have any instructions on package size etc. What if that SELECT is from a Calc View -- does the Calc View have to materialize all its data before select would take place? If so, what if it has so much data that it may not fit the memory limit?

Torsten_ · ‎11-15-2016

Hi Roman,
the package size would be evaluated in a pre-step in case packaging is possible.
Up to BW 7.50 SP04 and SAP HANA SP11 it is only possible to create package if the source object provides technical fields (like the change log).
In case packaging is possible the pre-step calculate how many source data packages could be collected together in one package.
You can see this in the SQL statement (INSERT AS SELECT):
'PLACEHOLDER'=('TR_00O2TMD4JQODBR18P89UDOEXX.$$filter$$', '( ( "REQUEST" = ''ODSR_CJQTEO2HMSE05TNSTB1K5977J'' AND ( "DATAPAKID" >= ''000001'' AND "DATAPAKID" <= ''000001'' ) ) ) ')

If packaging is not possible, you're right. Out-of-memory situation are possible. Please check the note
2230080 – Consulting: DTP: Out of memory situation during ‘SAP HANA Execution’ and the ‘Request by R...

I've collected some further information in the sub blog HANA based BW Transformation – SAP Notes

Torsten

former_member102219 · ‎11-15-2016

Thanks! Does it mean that if I build a Calc View with technical fields in it, I could access the packaging functionality? Alternatively, will it be possible in the later releases to push-down semantic partitioning to HANA?

Torsten_ · ‎11-16-2016

Hi,
no the technical fields are in the source object. If you read from a classic DSO (ODSO) and you select the change log as source table there are technical fields like REQUEST, DATAPAKID and RECORD. These technical field can be used to read data in packages.
We had described this in the mention note 2230080.

What do you mean with "push-down semantic partitioning to HANA"?
Do you want to use a SPO as source and push down the transformation reading from the SPO?
In that case you can encapsulate the SPO in a CompositeProvider. The CompositeProvider can be used as source for a transformation.

Torsten

former_member102219 · ‎11-16-2016

Thank you for bearing with me Torsten. The scenario I tried to implement (in 7.5 SP03) was like this:

1. use an External View of a BW infoprovider, e.g. ADSO in an custom Calc View,
2. implement a few transformations in that Calc View,
3. expose that Calc View back to BW as Composite Provider,
4. create a HANA-optimized BW transformation from that Composite Provider to a target (an ADSO) with simple 1:1 mappings (for data snapshot).

It actually worked and indicated better performance than ABAP-based transformation.

What I discovered runtime, judging by HANA memory consumption, is that the ETL executes in kinda two phases:
A. the Calc View materialized all resulting data first. At this time no activity has been recorded in DTP log, and I could only see memory consumption increasing in HANA admin cockpit;
B. the materialized data has been inserted into the BW target in packages, that activity being recorded in DTP log.

Obviously, if phase A materializes too much data, the process will run of memory and crash, hence I'd need to make it to somehow understand either REQUEST-DATAPAKID fields of the source, or use something similar to DTP semantic partitioning, that would be pushed down to HANA, so phases A-B would be executed a few times, each time for a single value of a partitioning characteristic.

Is this all realistic, or a dead-end not recommended by SAP?

I've read the note 2230080 but I fail to see how it could be applied. I guess I'd have to redesign points 2-4 of my implementation to use a SQL script based transformation (I'll use SP06 system for that).

Former Member · ‎03-23-2017

Hi Roman, How are you? Long time no hear. A very interesting question you are rising here. Take care

former_member102219 · ‎03-27-2017

Hi Philipp, I'm all good thanks, drifting into Hadoop/Spark/Scala area these days, actually. And you?

Regarding the questions I asked above, I finally ended up implementing what I mentioned - a wrapper that picks some semantic partitioning characteristic and runs the DTP-on-HANA a few times in a loop.