Application Development Blog Posts
Learn and share on deeper, cross technology development topics such as integration and connectivity, automation, cloud extensibility, developing at scale, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 
Hi,

When you start to code ABAP you are probably not paying much atention to the table typekind you use (STANDARD, SORTED, HASHED), and you usually forgot to use the statement addition BINARY SEARCH.

Most probably you know that the use of this addition improves performance, but when you're starting you may be not aware of how much it does. When using LOOP WHERE the BINARY SEARCH is performed under the hood, so for newbies it can become even more complex to understand when and why ABAP is doing this search optimal for you.

I coded a little example testing 4 cases:

  1. LOOP WHERE over a STANDARD TABLE (Sorted with a sort statement)

  2. LOOP WHERE over a SORTED TABLE

  3. READ TABLE BINARY SEARCH + LOOP FROM INDEX over a STANDARD TABLE (Sorted with a sort statement)

  4. READ TABLE BINARY SEARCH + LOOP FROM INDEX over a SORTED TABLE


I prepared the variables below for my program:
  TYPES: BEGIN OF lt_head,
docnr TYPE n LENGTH 10,
gjahr TYPE gjahr,
END OF lt_head,

BEGIN OF lt_position,
docnr TYPE n LENGTH 10,
gjahr TYPE gjahr,
posnr TYPE n LENGTH 3,
END OF lt_position.

DATA: li_head TYPE STANDARD TABLE OF lt_head
WITH HEADER LINE,
li_positions TYPE STANDARD TABLE OF lt_position
WITH HEADER LINE,
li_positions_sorted TYPE SORTED TABLE OF lt_position
WITH NON-UNIQUE KEY docnr gjahr posnr
WITH HEADER LINE.

LI_HEAD will contain 5000 entries and each of this entries will contain 5 entries in the LI_POSITIONS and LI_POSITIONS_SORTED tables, the program is going to perform just the LOOPs, no code inside, as show below (You can ignore the macros i used to get runtime measurement):
  START_MEASUREMENT 'SORT'.
SORT li_positions BY docnr gjahr posnr.
END_MEASUREMENT 'SORT'.

START_MEASUREMENT 'LOOP WHERE'.
LOOP AT li_head.
LOOP AT li_positions WHERE docnr = li_head-docnr
AND gjahr = li_head-gjahr.
ENDLOOP.
ENDLOOP.
END_MEASUREMENT 'LOOP WHERE'.

START_MEASUREMENT 'LOOP WHERE SORTED TABLE'.
LOOP AT li_head.
LOOP AT li_positions_sorted WHERE docnr = li_head-docnr
AND gjahr = li_head-gjahr.
ENDLOOP.
ENDLOOP.
END_MEASUREMENT 'LOOP WHERE SORTED TABLE'.

START_MEASUREMENT 'BINARY SEARCH'.
LOOP AT li_head.
READ TABLE li_positions WITH KEY docnr = li_head-docnr
gjahr = li_head-gjahr
BINARY SEARCH.

IF sy-subrc = 0.
LOOP AT li_positions FROM sy-tabix.
IF li_positions-docnr <> li_head-docnr OR
li_positions-gjahr <> li_head-gjahr.
EXIT.
ENDIF.
ENDLOOP.
ENDIF.
ENDLOOP.
END_MEASUREMENT 'BINARY SEARCH'.

START_MEASUREMENT 'BINARY SEARCH SORTED TABLE'.
LOOP AT li_head.
READ TABLE li_positions_sorted WITH KEY docnr = li_head-docnr
gjahr = li_head-gjahr
BINARY SEARCH.

IF sy-subrc = 0.
LOOP AT li_positions_sorted FROM sy-tabix.
IF li_positions_sorted-docnr <> li_head-docnr OR
li_positions_sorted-gjahr <> li_head-gjahr.
EXIT.
ENDIF.
ENDLOOP.
ENDIF.
ENDLOOP.
END_MEASUREMENT 'BINARY SEARCH SORTED TABLE'.

The execution of the program gives this results:



As expected LOOP WHERE is giving the worst performance by far, the reason why this LOOP is performing so bad is ABAP have not enough information to perform the BINARY SEARCH optimization (I don't know about ABAP internals, but i'm pretty sure it must do this kind of optimization).

The difference between first and second LOOP, that helps ABAP optimize the search, is the table typekind that lets it know that the internal table is sorted, as important as know the table is sorted is to know that is sorted by the proper fields, and ABAP knows this because we specified the key in the table definition, as you can see in the code my LOOP where clause is using the fields of the key. You must note that, as in the example, using full key is not required to get the optimization.

So even having the standard table sorted by the same fields ABAP cannot be sure of that because our table is not defined that way, this forces ABAP to do a full search, wherever you find a LOOP is having poor performance you can check that you're table is defined as sorted and that you specified the right key for the table that allows this optimization.

I also tested to do a manual BINARY SEARCH and start looping from that position until i found a different combination for my key fields, this is a solution i usually used when i didn't understand how ABAP works, i surprisingly found that it performs slightly better, may be useful if you need to squeeze until the last bit of performance.

You can find full code at this github repository: https://github.com/jordirosa/ABAP4Newbies

Regards.
12 Comments
keremkoseoglu
Contributor
You might be interested in the following performance test where Binary Search is compared against alternative approaches in nested loops: https://keremkoseoglu.com/2015/07/09/abap-nested-loop-performance/ .

Long story short: Data types within internal tables seem to affect the performance outcome. But in average, Binary Search seems to be the slowest method (compared to Hashed Table, Paralel Cursor and Sorted Table).
Sandra_Rossi
Active Contributor
Maybe you can add a README to your repository to explain that people may install it with abapGit (newbies are probably not aware of its existence).
michael_piesche
Active Contributor

When it comes to table access, you always have to know the data and the number of accesses, in order to make a well informed decision about what type or better what search algorithm to choose. And for each search algorithm, in almost all cases, there is also a ‘sorting’ algorithm that has to be performed before hand.

Examples:

  1. You have an unsorted table with 1’000’000 (1 million) entries and you are looking for just one (!) specific entry
    => dont bother with sorting and using binary search or building buckets and using a hash table.
    Because this will allways take at least O(n+1) time, where n is the number of items in your table.
    Where as the sequential search will only in the worst case give you a performance of O(n) time.
  2. You have two tables with corresponding data and you need to access all entries (!) and map the records from table one to those of table two
    => In this case, a parallel cursor will be the fastest, where both tables need to be sorted first
    If n and m are the sizes of the two tables, then sorting the tables will take O(nlogn) and O(mlogm) time in the worst cases, but only O(n+m) time to do the nested loop
    The only problem I have with the parallel cursor, is that you have to implement and test it yourself, because it does make a difference on the implementation if you have a 1:1 or a 1:n match, whereas a sorted or hash table search does that for you already and looks a little cleaner
  3. For anything in between I would most likely start out with a sorted table first, because I also might need it sorted or I might constantly add and delete records. But if I dont need any of those two functions beside the search and I realize it could perform better, than I would switch to hashed.
    I very seldom would do a binary search on a standard table that has been sorted seperatly, because you and other developers always need to be aware of this, and also adding items to your table always requires a resort

Update:

jordirosa in those more "complex" examples, you could also try out having not only primary keys but also multiple secondary keys and whether the table is sorted or not, and also see what this does in terms of performance to not only search but also insert and delete functions.

And keep a binary search statement solely to a table type that is not sorting by it self, but has to be sorted by coding. Using the binary search statement on a sorted table type with non- or unique keys can lead to wrong results, if you dont access it by its keys.
Altough the binary search statement on a sorted table is pretty much the same as a search by primary or secondary key on a standard or sorted table, there still can be some differences in performance, especially insert and delete, due to overhead in indizes but also how SAP has internally programmed those functions.

Thanks for sharing the article, is really interesting, i will take a look to more posts in your web.
Coudn’t agree more, i will update the repository.

Thank you 🙂
0 Kudos
Good explanation, maybe i'll write another post with some more complex examples so i can get more feedback.

Thanks for sharing your knowledge.
Jelena
Active Contributor
I very seldom would do a binary search on a standard table that has been sorted separately, because you and other developers always need to be aware of this, and also adding items to your table always requires a resort

Bingo. SORTED table type is already using binary search algorithm. I've ran this through different scenarios before and, like kerem.koseoglu , arrived at a conclusion that on average, SORTED table performance is not worse than when using BINARY SEARCH. Interestingly, I ran the same program probably 30 times in a row and the result fluctuated by a single-digit percent between SORTED and BINARY SEARCH option. Go figure.

But the main thing is: it's not just about the performance. BINARY SEARCH can offer a marginal improvement in runtime while it's effect on the code maintenance and "cleanliness" is far worse. If you forget to sort (or re-sort) the table, you could simply get a wrong result.

For this reason, when working on the 2nd edition on the ABAP Introduction book, we have revised the section on BINARY SEARCH to advise against using it. We still had to explain it, but made it clear this is not a recommended solution anymore for multiple reasons.

Thank you!
0 Kudos
I was planning to write about the problem you mention, i wanted to show the debugger option that checks that a table is sorted before BINARY SEARCH, i know it's not perfect because if your table is sorted just by chance the check will be ok, but this option can help reduce mistakes.
beyhan_meyrali
Active Contributor
0 Kudos

Hi,

It is safer to add binary search conditions to loop where, otherwise you may end up reading other records which satisfies index and other where condition parameters.

  START_MEASUREMENT 'BINARY SEARCH SORTED TABLE'.
  LOOP AT li_head.
    READ TABLE li_positions_sorted WITH KEY docnr = li_head-docnr
                                            gjahr = li_head-gjahr
                                   BINARY SEARCH.

    IF sy-subrc = 0.
      LOOP AT li_positions_sorted FROM sy-tabix
where docnr = li_head-docnr and gjahr = li_head-gjahr.
        IF li_positions_sorted-docnr <> li_head-docnr OR
           li_positions_sorted-gjahr <> li_head-gjahr.
          EXIT.
        ENDIF.
      ENDLOOP.
    ENDIF.
  ENDLOOP.
  END_MEASUREMENT 'BINARY SEARCH SORTED TABLE'.
Sandra_Rossi
Active Contributor
0 Kudos

beyhan_meyrali Sorry but again you don't understand what an internal table of TYPE SORTED is:

DATA li_positions_sorted TYPE SORTED TABLE OF lt_position WITH NON-UNIQUE KEY docnr gjahr posnr.

TYPE SORTED TABLE means that you don't need to use FROM sy-tabix in LOOP AT, and so you don't need READ TABLE ... BINARY SEARCH (also it's non-sense to indicate BINARY SEARCH with an internal table declared TYPE SORTED TABLE, if you prefer you can add a comment to indicate it's an internal table TYPE SORTED but that's all you need to do).

 

I hope you can read the ABAP documentation and understand how TYPE SORTED TABLE works.

beyhan_meyrali
Active Contributor
0 Kudos

Hi Sandra,

Let me clarify,

With binary search you find the index, and then pass it to loop and it is best to add key fields to loop where condition, since loop will start from tabix, it will only make your code safer without performance lose.

I hope it is a bit clear for you. Maybe you should also read and try to understand before you label me :). 

Sandra_Rossi
Active Contributor

To other people who read the blog posts of SAP people, experts and ABAP documentation: with an internal table of TYPE SORTED, LOOP AT is using implicitly a binary search if you indicate the first key fields as explained here itab - Optimizing the WHERE Condition - ABAP Keyword Documentation (sap.com), especially "Prerequisites for the Optimization of Sorted Keys", none of the examples use READ TABLE ... BINARY SEARCH.

Also, SAP explicitly recommends to NOT USE READ TABLE - BINARY SEARCH, instead use sorted tables (in the sense of declared with TYPE SORTED TABLE OF) :

"Instead of using the addition BINARY SEARCH, it is recommended that work is done either with sorted tables or with sorted secondary table keys."