HANA based BW Transformation – Guidelines and Reco...

Torsten_ · ‎03-24-2017

Last change on 01/28/2018

This blog is part of the blog series HANA based BW Transformation.

The following list is based on experiences with real customer messages and is intended to help customers learn from the mistakes of other customers.

9.1 SQL Script Procedures

Some SQL Script best practice tips could be found in chapter »Best Practices for Using SQLScript« inside the SAP HANA SQLScript Reference.

Here are some general SQLScript recommendations:

Do not mix the functions of SQLScript and CE functions within a script.
- Example: Do not code SQLScript and read inside the SQLScript from a procedure or CalcView with CE Functions).

Pass on all parameters as required to the lowest level

Reduce the size of the data set as early as possible in the data flow. This can be done by applying static filters or applying dynamic filters using parameters at the lowest level

Minimize the size of any intermediate result sets in complex processing. Do not include any extra attribute columns that may increase the rows of aggregated results being passed between nodes in the data flow.

Avoid extremely large intermediate result sets, even if they will be reduced or aggregated before the final result is returned to the client. These can have high memory requirements.

Perform calculations at the last node possible in the data flow (ideally done in final aggregation or projection). This will reduce the amount of data rows on which you are performing the calculations also reduce the number of columns being transferred from one node to another.

In some situations, you can use a union and then aggregate the results instead of using a join for better performance.

Reduce Complexity of SQL Statements: Break up a complex SQL statement into many simpler ones. This makes a SQLScript procedure easier to comprehend.

Identify Common Sub-Expressions: If you split a complex query into logical sub queries it can help the optimizer to identify common sub expressions and to derive more efficient execution plans.

Understand the Costs of Statements: Employ the explain plan facility to investigate the performance impact of different SQL queries used in scripts.

Reduce Dependencies: As SQLScript is translated into a dataflow graph, and independent paths in this graph can be executed in parallel, reducing dependencies enables better parallelism, and thus better performance.

The following listed SQL Script issues are related to SQL Script in the context of BW.

9.1.1 Initial values

Within the BW all fields are defined as NOT NULL columns. Therefore, the developer is responsible to ensure that all data inside the result table of a SQL Script routine are initialized type conform. The SQL function COALESCE could be used to avoid NULL values and ensure that all values would be initialed correct in case of NULL values. Characteristic sample:

COALESCE( inTab."MATERIAL”, '' ) AS "MATERIAL”

In case of numeric field, it is important to initialize the field by a numeric value. The default initial value for a numeric field is 0 and not ‘’. Key figure sample:

COALESCE( inTab."KEYFIGURE”, 0 ) AS "KEYFIGURE”.

9.1.2 Avoid preventing filter push down

The optimizer tries to push down the filter as far as possible to the source to keep the data volume as small as possible. It may have a negative performance effect if the filter push down is prevented.

The calculation engine tries to optimizer the SQLScript coding in order to compile all local select statements to one select statement. Therefore the single statements would be in-lined. Sample:

Original statements:

tab = select A, B, C from T where A = 1;

tab2 = select C from :tab where C = 0;

In-lined statement:

tab2 = select C from (select A,B,C from T where A = 1) where C=0;

For further information about in-lining see HINTS: NO_INLINE and INLINE. The hints could also be used to force or prevent inlining for a single select.

A further option to prevent inlining is to put the coding into a SEQUENTIAL EXECUTION block. All statements inside a SEQUENTIAL EXECUTION block (BEGIN SEQUENTIAL EXECUTION … END;) would not been in-lined and would be executed sequentially. The force of sequential execution prevents parallelization!

Also the usage of Imperative SQLScript Logic prevents that the optimizer can inline the single SQL statements. Therefore the recommendation is to use declarative SQLScript logic to implement the SQL script procedures in the BW transformation context, if the logic allows it.

Use of imperative SQLScript logic

The procedure SQL code would be translated into L-coding in case of using structure control elements like LOOPs or IF-conditions or declaration of local variables, see Figure 9.1.

Figure 9.1: Resulting CeLjitPop for converted SQL Script code to L

Use

hints

the sequential execution block and

imperative SQLScript logic

with care!

Sometimes it makes sense to test if one of this features would increase the performance.

9.1.3 Concatenate in SQL Script within an AMDP Method (Added on 08/03/2017)

The following statement

SELECT 'A' || ' ' || 'B' || ' ' || 'C' FROM <any table>;

within an AMDP method will lead into the result 'ABC'.

The expected blanks are not available. To get the expected result it is necessary to set the space by using the CHAR Function (String).

The following statement

SELECT 'A' || CHAR(32) || 'B' || CHAR(32) || 'C' FROM <any table>;

delivers the expected result 'A B C'.

9.1.4 Keep internal table small

Keeping the internal table small is a general recommendation and means always check your WHERE condition twice. While processing the Calculation Scenario by the calculation engine it is necessary to materialize some Calculation Views. Materializing views means allocation of memory. From performance and memory allocation perspective we recommend to reduce the data volume as soon as possible, see paragraph 9.1.7 »Use of DISTINCT«. Inside all SAP HANA engines try to push down filter conditions as close as possible to the sources.

9.1.5 Use of DISTINCT

A join can multiply the number of resulting rows and in case of selecting only a subset of the whole width a DISTINCT can reduce the resulting number of rows significantly.

The customer coding below results in an internal table with more than 700.000.000 rows.

(inTab ~ 16.000.000 rows)

L_ITAB2 = 

  SELECT "BPARTNER", 

         "BP_EXTERN" 

    FROM :L_ITAB1   

   INNER JOIN :inTab      

      ON :L_ITAB1."BP_EXTERN" = :inTab."BP_EXTERN"   

   WHERE ...;

By adding DISTINCT to the statement the number of rows could be reduced to 1.300.

L_ITAB2 = 

  SELECT DISTINCT "BPARTNER", 

                  "BP_EXTERN"

    FROM :L_ITAB1    

   INNER JOIN :inTab

      ON :L_ITAB1."BP_EXTERN" = :inTab."BP_EXTERN"

   WHERE ...;

9.1.6 Potential pitfall at UNION / UNION ALL

The following aspects should be considered when using the UNION operator.

The structure of the first SELECT statement defines the column names of the resulting structure (table)

The UNION operator matches the columns from the involved SELECT statements by position and NOT by name

If the data type of a column differ from each other, the HANA engine will try to determine an appropriate data type

The individual SELECTs in an UNION can base on a persistent database table or an internal SQL Script (iTab) table. By combining several SELECT statements to a new result table one must pay attention to the column order of every single SELECT statement.

Figure 9.2 shows in (1) and (2) two table definitions with the same columns but the column order is different. (3) shows a structure definition for the output parameter of the procedure in (4).

The first UNION in (5) uses the * operator instead of an explicit column list (see comments). The column names of the resulting tables are based on the columns ("CHA_01", "CHA_02", "KYF_01", "KYF_02", "RECORD") from the first SELECT statement. The datatype from the third (KYF_01 / RECORD) and the fifth (RECORD / KYF_02) column of both SELECT statements are different. The

Figure 9.2: UNION without and with explicit field list

9.1.7 Pitfalls in field routines

The runtime behavior of AMDP field routines are different from the runtime behavior of ABAP field routines.

The processing logic for an ABAP field routine is based on an external loop over the source package and for each row a field routine is called with a single value. The input fields of an ABAP field routine are single value variable (scalar variable). An ABAP field routine can consume several single values, depending on the field routine definition. An ABAP field routine always returns (EXPORTS) to one single value field.

The process logic for an AMDP field routine is based on every value in the column for the source package which is currently being processed. That means, if the AMDP field routine be called all values from the selected input fields are handover in one call. In Figure 9.4 only one source field is selected. Therefore, the table base input parameter for the AMDP routine contains one data column. There are further technical columns in the input table.

Figure 9.4: Field Routines Definition

The transformation process expects thus from the AMDP field routine an output table with the same number of rows as the input table!

To get a better understanding why this is so important I’ll provide some background information about the runtime processing.

A field based AMDP routine is embedded in a Calculation Scenario. For more details read through HANA based Transformation (deep dive).

Figure 9.5 shows the internal process of an AMDP field routine inside the Calculation Scenario. After extracting the data from the source object the start routine (in case a start routine is available) is called. In the next step the transformation process splits the dataset into several parts. To illustrate this Figure 9.5 is based on a transformation with two AMDP field routines. Therefore, the data package is split into three parts. One part for each field routine and the third part for the other transformation rules like formula or direct assignment or ….

This enables the underlying engine to parallelize the transformation process.

After processing the field routines and the other rules it is necessary to bring the results together. To do so one must bring all three data packages in the same order. Each data package contains two technical columns for this purpose. The Calculation Scenario contains an order projection for each processing thread to align the data of each thread in the same order.

The vertical union brings the separated data packets back together for further processing.

Figure 9.5: Field Routines and Vertical Union Part 1

Regarding the split packages, there are two prominent difficulties:

Changing the number of rows within an AMDP field routine

Changing the sorting order of the data column within an AMDP field routine

9.1.7.1 Changing the number of rows within an AMDP field routine

The following coding example should set the attribute Vendor (TK_VENDOR) if there is a vendor for the material (TK_MAT). The last part of the where condition (and mat."/BIC/TK_VENDOR" <> '') filters records without a vendor out. This can reduce the number of rows.

METHOD S0001_G01_R11 BY DATABASE PROCEDURE FOR HDB LANGUAGE SQLSCRIPT OPTIONS READ-ONLY USING /BIC/PTK_MAT.



-- target field: TK_VENDOR

  outTab =

    select mat."/BIC/TK_VENDOR",

           intab.record,

           inTab.sql__procedure__source__record

     from :inTab as inTab

     inner join "/BIC/PTK_MAT" as mat

       on inTab."/BIC/TK_MAT" = mat."/BIC/TK_MAT"

     where mat.objvers = 'A'

       and mat."/BIC/TK_VENDOR" <> '';

  errorTab =

    select '' as ERROR_TEXT,

           '' as SQL__PROCEDURE__SOURCE__RECORD

     from dummy

     where dummy <> 'X';



ENDMETHOD.

Listing 9.1: Example of a field routine that removes the records

Figure 9.6 shows a further example routine that makes removing rows possible. (1) shows the whole data package we get from the source object or from the start routine (2) shows the package split. Here we can see that each package gets two additional technical columns to the existing data column(s) SQL_PROCEDURE_SOURCE_RECORD and RECORD. For further information on these two fields go to my blog HANA based BW Transformation – New features delivered by 7.50 SP04.

(3) shows the field routine source code and the other transformation rules. The field routine logic doesn’t write all rows from the input table into the output table. The row that wasn’t transferred is crossed out in (4). The resulting data package is shown in (5). We get a NULL value, as seen in (6), based on a pitfall in the coding.

Additionally, to the NULL value a part of the data package is moved. The back values are moved up by the missing row.

NULL values are not allowed in BW and therefore the DTP leads to an error.

Figure 9.6: Field Routines and Vertical Union Part 2

In both cases, the Listing 9.2 and in Figure 9.6, the misconduct may occur but not necessarily. Often the test data in the development are well prepared. This means the error should not occur.

Figure 9.7 provides an example to avoid the pitfall. The not wanted values are replaced by an initial value.

Figure 9.7: Field Routines and Vertical Union Part 3

9.1.7.2 Changing the sorting order of the data column within an AMDP field routine

Within a field routine it is important to keep the sorting order for the data column. Therefore, do not change or delete the values from the two technical columns SQL_PROCEDURE_SOURCE_RECORD and RECORD. It is also not possible to add or delete rows.

Figure 9.8 shows how data can be sorted and how not. (1) shows the input data with one data column (Col 5). In case it is necessary to sort the data from Col 5 it is important to keep the order stable regarding the two technical fields SQL__PROCEDURE__SOURCE__RECORD and RECORD.

(2) is a negative example that showing a potential source of error by changing the sort order. The next step after the field routine sort the field routine data (outTab) over the two technical columns. In case (2) the resulting order is not determined.

(3) shows the way how the data can be sorted by the data column with keep the sorting order stable regarding the technical fields.

Figure 9.8: Changing the sorting order of the data column within an AMDP field routine

Changing the sorting order of the data column causes errors when merging the individual packages.

This kind of error is very difficult to identify. The result does not lead into a technical error. Only the values from one or more columns are in wrong rows.

9.1.7.3 Recommendations regarding AMDP field routines

We do not recommend to use field routine to implement complex logic like reading additional data or implement complex business logic.

The general purpose and our recommendation for field routine is to implement logic which cannot implement as formula. Some formulas could prevent a push down, see note 2423215 - SAP HANA Processing: Non Supported Formulas in HANA Transformations. This kind of formulas could be implemented as a field routine.

... coming soon...

wounky · ‎03-26-2017

Hello Torsten,

Looking forward to the continuation. Keep up the good work!

It would be great if you could share some loop & lookup code samples (like in ABAP with field symbols).

Best regards,

Sebastian

Former Member · ‎04-05-2017

Hello Torsten,

you mentioned in your blog that it is possible to join the :inTab with an DSO, like the following:

outTab = SELECT  intab."DOC_NUMBER",

 intab."RECORDMODE",

 FROM :inTab as intab

 inner join "_SYS_BIC"."system-local.bw.bw2hana/FSDAHD01" as sd

 and  sd."DOC_NUMBER" =  intab."DOC_NUMBER";

We have here the problem that (can be seen in Plan PlanViz) the system will extract all data from

"_SYS_BIC"."pr.sd/DPRSDL016", that leads to high usage of memory.

If we don't use the generated view (in DSO Externe SAP-HANA-Sicht für Reporting) and instead use the active table from the dso "/BIC/AFSDAHD01", the system extracts only the data records which are needed and results in less (appr. only 20% of scenario before) usage of memory.

We use a classical DSO, the generated view is an analytical view.

Any suggestions?

Best Regards,

Florian

Torsten_ · ‎04-07-2017

HI,

it's okay to use the active table here.

as long as we do not offer a SQL Script API to determinate the active DSO table it is okay to use the hard coded name.

but keep in mind if you later migrate to BW/4 the classic DSO will be migrate to an advanced DSO, that means also the active table name here will change.

Our up coming migration tool will point such coding blocks, but from current point of view the coding must be handle by the customer.

Torsten

Former Member · ‎04-18-2017

Hi Torsten,

Firstly thanks for your answer.

We plan later also to use an nls (license and system exist but DAPs aren’t created), the recommendation is here to use the generated view, because otherwise the archived date would not be read.

For me the behavior of the hana database seems to be like a bug.

Normal behavior should if I join table A with table B and table A has 1000 entries (which are also the different keys) and B has 100 mio entries with 2000 entries which are equal to table A (according to the join statement key) that the database only extracts the 2000 entries from table B and not the complete table.

This seems to be work correctly if we use the active table of the dso but if we use the generated view it doesn’t fit.

Any other suggestions…we opened also a ticket…

Regards,

Florian

Torsten_ · ‎04-19-2017

Hi,

could you send me the incident number via mail.

torsten.kessler@sap.com

Torsten

former_member308709 · ‎04-25-2017

Hi Torsten,

Nice document on HANA based BW Transformation - guidelines and recommendations. Keep up the good work!!. Looking forward your continuation in other topics too 🙂

BR/

Anand Kotha

PavolF · ‎07-23-2018

Hi Thorsten,

maybe you are mentioning it somewhere but I would like to ask if you compared performance of these BW AMDP Transformations with HANA Native views. I very like develop these BW transformations for my clients but when I attended Case studies in a SAP conference I was surprised to see that many clients (i.e. Lufthansa ) opt for multiple( hundreds) HANA Native Views instead of these new BW transformations. They calculate everything in several steps via the HANA Views what was previously done in Start/End/Expert routines. They often talk how incredibly faster it is compared to ABAP world but I could not get any comparison with these new BW Transformations.

Would be really happy to hear your opinion regarding this.

Thanks.

Cheers,

Pavol

Torsten_ · ‎07-23-2018

Hi,

I didn't understand what you exactly wants to compare!?!

A "AMDP Transformation" does not really exist. we are able to process a transformation in two different runtimes. In ABAP and in HANA.

A transformation executed in the ABAP runtime is at the end a ABAP report.

A transformation executed in the HANA runtime is at the end a CalculationScenario.

We only talking about AMDP if the customer added at least one routine.

In both cases (with or without AMDP) a HANA executed transformation based on a HANA View (this HANA View based on the CalculationScenario)

Our recommendation is to use the HANA processing (in best case without AMDP routines) whenever possible.

In case of adding a routine you have to consider:

Is the logic very complex (do I need loop, conditions, ...)

Is the amount of data very huge (big data)

Depending on the answers above you had to decide which kind of coding (ABAP or SQLScript (AMDP)) is better for your case.

Torsten

PavolF · ‎07-23-2018

Hi Thorsten,

in AMDP BW transformation I create BW transformation and I connect it to next aDSO. In HANA Native approach they take a View of aDSO or Composite Provider in HANA Studio and create other HANA NATIVE Views between different providers/Tables. They don't need intermediate aDSOs (with AMDP I think you wrote we should not have more than 2 transformations between Infoproviders) they just create Calculated Views.

I create normal BW transformation without HANA Studio. I can use BW Masterdata lookups in Fields, or other things in SAP GUI. THen I create a start routine where I write SQL scripts im AMDP. I do not create any physical HANA Views though they may be created based on what I program in start routine.

Pavol

Torsten_ · ‎07-24-2018

Okay,

that what you are describes is a so called mix-scenario.

They read data from a BW object within a native CalculationView and do in the CalculationView a lot of stuff like joining master data or other BW table / external views.

And later on they bring the result back into the BW by adding the CalculationView into a CompositeProvider.

I'm right so far?

The difference between this approach and a BW Transformation is:

in the mix scenario the data are not stored in the Composite Provider. The logic from the CalculationView must be executed by each request based on the HCPR

In case of grown data volume the mix scenario can run into performance issue.

Both approaches are valid, it's quite difficult to say which way is the better way, it depends always on the business case.

Torsten

PavolF · ‎07-25-2018

Hi Thorsten,

yes that's it. I did not fully understand your comment that logic must be executed by each request. Isn't it the same as in BW Transformation? When I create my AMDP start routine it is also executed by each request. In ABAP there are 50 000 records per request and in HANA mode we have 1 million records at the same time.

I thought HANA is designed especially for big data volumes so it should not have problems with it. Is there maybe a SAP OSS Note describing possible volume issues?

Problem with BW transformations is that SAP is still adding technical fields in case of AMDP routines. Last year there were 2 tech fields added and this year there is at least one more. WHen my client makes an SP upgrade of his 7.5 system they run into problems when trying to reactivate the transformations. They then have to manually add the new tech field to the AMDP coding.

Thank you for all your hints.

Cheers,

Pavol

Torsten_ · ‎07-25-2018

Hi,

yes that’s it. I did not fully understand your comment that logic must be executed by each request. Isn’t it the same as in BW Transformation? When I create my AMDP start routine it is also executed by each request. In ABAP there are 50 000 records per request and in HANA mode we have 1 million records at the same time.

Yes you are right if you only consider the stating request. But whats happen in case of a query request? In case of using a virtual provider (HCPR) the logic from the view is executed by each request.

Regarding HANA and Big Data´; I'm quite sure there are several notes available for this topic. But to find the right note you need the revision.

Regarding your issue related to the technical fields:

Some times it is necessary to enhance a interface to implemented customer requirements.

We always try to change an interface in a way that everything is optional, so that there is no impact for the customer after an upgrade.

But in case of SQLScript is that a little bit complicate, SQLScript is more restrict than ABAP.

Torsten

kienchanrio1 · ‎10-04-2018

Is it possible to created an AMDP transformation where you only populate/map some of the target fields. Say I have 10 target fields. My AMDP only popuate 3 of the fields. Has anyone tried that? I can't seem to get it to work. It won't let me activate unless all are mapped. But as you know, under normal transformation, we don't have to populate/map all the fields.

Torsten_ · ‎10-22-2018

Okay I assume you are talking about BW4 (?) and I further assume you are talking about expert routines.

In case of using an expert routine you are responsible for the whole transformation logic. That means all fields also the initial values for the not used fields.

Torsten

jozsef_zerczi · ‎11-15-2018

One option is to use an AMDP start routine instead of the expert routine, as the start routine offers the possibility to select target fields. It worked for me.

József.

former_member534160 · ‎10-04-2019

Hi Torsten,

when write an expert routine in HANA or an end routine, how to handle different keys of source and target? For example, we need to sum kfgs, do we need to sum kfgs using target keys? or using no-kfgs infoobjects including technical objects like RECORDMODE, RECORD...

Regards,

Shidong

Torsten_ · ‎10-08-2019

Hi Shidong,

the question is not so easy to anwser.

Is the target key smaller (reducing) or bigger (enlarging) regarding the source key?

I assume reducing
- in case of an end routine you should already get the reduce key and as far as I understand your scenario there is nothing to do regarding key figure aggregation
- in case of an expert routine you are responisble to aggregate or overwrite the key figure values

Torsten

former_member638921 · ‎10-29-2019

Hi,

I used below select query with group by to aggregate a key figure (TURNOVER) . But the aggregation function:SUM(TURNOVER) is not working in SELECT. please let me know the correct function to aggregate a column in HANA based BW transformation(BW4HANA).

Below select query is implemented in expert routine.

outTab = select FISCYEAR, FISCPER, FISCVARNT, GL_ACCOUNT, CHRT_ACCTS, COMP_CODE, PROFIT_CTR, COSTCENTER, CO_AREA, FUNC_AREA, CURTYPE, RECORDMODE, CURRENCY, SUM("TURNOVER") as TURNOVER, RECORD, SQL__PROCEDURE__SOURCE__RECORD from :inTab group by "FISCYEAR","FISCPER","FISCVARNT","GL_ACCOUNT","CHRT_ACCTS","COMP_CODE", "PROFIT_CTR","COSTCENTER","CO_AREA","FUNC_AREA","CURTYPE","RECORDMODE","CURRENCY","RECORD", "SQL__PROCEDURE__SOURCE__RECORD"

Torsten_ · ‎10-29-2019

Hi,

from the SQL perspective I didn't see any error in your statement.

You can play around by the coding from the HANA help:

DROP TABLE "MyProducts";

CREATE COLUMN TABLE "MyProducts"(

"Product_ID" VARCHAR(10),

"Product_Name" VARCHAR(100),

"Category" VARCHAR(100),

"Quantity" INTEGER,

"Price" DECIMAL(10,2),

PRIMARY KEY ("Product_ID") );



INSERT INTO "MyProducts" VALUES('P1','Shirts', 'Clothes', 10, 20.99);

INSERT INTO "MyProducts" VALUES('P2','Jackets', 'Clothes', 20, 99.49);

INSERT INTO "MyProducts" VALUES('P3','Trousers', 'Clothes', 30, 32.99);

INSERT INTO "MyProducts" VALUES('P4','Coats', 'Clothes', 40, 129.99);

INSERT INTO "MyProducts" VALUES('P5','Purse', 'Accessories', 10, 89.49);





SELECT SUM("Quantity") , "Category"

FROM "MyProducts" 

--WHERE "Product_Name" IN ('Jackets', 'Coats')

group by "Category"





What kind of error do you get? 

Or what do you mean by "But the aggregation function:SUM(TURNOVER) is not working ..."



Torsten

former_member346622 · ‎11-06-2019

Hi Torsten,

When i try to create a column table in BW4HANA HANA TRFN its throwing an error for the above mentioned CREATE

"MYPRODUCTS" is unknown. ABAP objects and DDIC objects must be declared in the METHOD statement. Local names must start with ":" here.

We have a scenario where need to create 24 new records based on incrementing FISCPER from a single record in an internal table . We tried doing with insert but DML statements are not allowed in BW4HANA TRFNs. I see some functions which are supported in HANA like 'creating temporary table' or 'insert into' are not working in AMDP TRFNs in BW4HANA. Is there a way that i am missing out or any way we could achieve this?

Regards,

Manisha