ABAP for newbies - Importance of BINARY SEARCH

former_member571549 · ‎02-06-2020

Hi,

When you start to code ABAP you are probably not paying much atention to the table typekind you use (STANDARD, SORTED, HASHED), and you usually forgot to use the statement addition BINARY SEARCH.

Most probably you know that the use of this addition improves performance, but when you're starting you may be not aware of how much it does. When using LOOP WHERE the BINARY SEARCH is performed under the hood, so for newbies it can become even more complex to understand when and why ABAP is doing this search optimal for you.

I coded a little example testing 4 cases:

LOOP WHERE over a STANDARD TABLE (Sorted with a sort statement)

LOOP WHERE over a SORTED TABLE

READ TABLE BINARY SEARCH + LOOP FROM INDEX over a STANDARD TABLE (Sorted with a sort statement)

READ TABLE BINARY SEARCH + LOOP FROM INDEX over a SORTED TABLE

I prepared the variables below for my program:

  TYPES: BEGIN OF lt_head,

           docnr TYPE n LENGTH 10,

           gjahr TYPE gjahr,

         END OF lt_head,



         BEGIN OF lt_position,

           docnr TYPE n LENGTH 10,

           gjahr TYPE gjahr,

           posnr TYPE n LENGTH 3,

         END OF lt_position.



  DATA: li_head TYPE STANDARD TABLE OF lt_head

                WITH HEADER LINE,

        li_positions TYPE STANDARD TABLE OF lt_position

                     WITH HEADER LINE,

        li_positions_sorted TYPE SORTED TABLE OF lt_position

                            WITH NON-UNIQUE KEY docnr gjahr posnr

                            WITH HEADER LINE.

LI_HEAD will contain 5000 entries and each of this entries will contain 5 entries in the LI_POSITIONS and LI_POSITIONS_SORTED tables, the program is going to perform just the LOOPs, no code inside, as show below (You can ignore the macros i used to get runtime measurement):

  START_MEASUREMENT 'SORT'.

  SORT li_positions BY docnr gjahr posnr.

  END_MEASUREMENT 'SORT'.



  START_MEASUREMENT 'LOOP WHERE'.

  LOOP AT li_head.

    LOOP AT li_positions WHERE docnr = li_head-docnr

                           AND gjahr = li_head-gjahr.

    ENDLOOP.

  ENDLOOP.

  END_MEASUREMENT 'LOOP WHERE'.



  START_MEASUREMENT 'LOOP WHERE SORTED TABLE'.

  LOOP AT li_head.

    LOOP AT li_positions_sorted WHERE docnr = li_head-docnr

                                  AND gjahr = li_head-gjahr.

    ENDLOOP.

  ENDLOOP.

  END_MEASUREMENT 'LOOP WHERE SORTED TABLE'.



  START_MEASUREMENT 'BINARY SEARCH'.

  LOOP AT li_head.

    READ TABLE li_positions WITH KEY docnr = li_head-docnr

                                     gjahr = li_head-gjahr

                            BINARY SEARCH.



    IF sy-subrc = 0.

      LOOP AT li_positions FROM sy-tabix.

        IF li_positions-docnr <> li_head-docnr OR

           li_positions-gjahr <> li_head-gjahr.

          EXIT.

        ENDIF.

      ENDLOOP.

    ENDIF.

  ENDLOOP.

  END_MEASUREMENT 'BINARY SEARCH'.



  START_MEASUREMENT 'BINARY SEARCH SORTED TABLE'.

  LOOP AT li_head.

    READ TABLE li_positions_sorted WITH KEY docnr = li_head-docnr

                                            gjahr = li_head-gjahr

                                   BINARY SEARCH.



    IF sy-subrc = 0.

      LOOP AT li_positions_sorted FROM sy-tabix.

        IF li_positions_sorted-docnr <> li_head-docnr OR

           li_positions_sorted-gjahr <> li_head-gjahr.

          EXIT.

        ENDIF.

      ENDLOOP.

    ENDIF.

  ENDLOOP.

  END_MEASUREMENT 'BINARY SEARCH SORTED TABLE'.

The execution of the program gives this results:

As expected LOOP WHERE is giving the worst performance by far, the reason why this LOOP is performing so bad is ABAP have not enough information to perform the BINARY SEARCH optimization (I don't know about ABAP internals, but i'm pretty sure it must do this kind of optimization).

The difference between first and second LOOP, that helps ABAP optimize the search, is the table typekind that lets it know that the internal table is sorted, as important as know the table is sorted is to know that is sorted by the proper fields, and ABAP knows this because we specified the key in the table definition, as you can see in the code my LOOP where clause is using the fields of the key. You must note that, as in the example, using full key is not required to get the optimization.

So even having the standard table sorted by the same fields ABAP cannot be sure of that because our table is not defined that way, this forces ABAP to do a full search, wherever you find a LOOP is having poor performance you can check that you're table is defined as sorted and that you specified the right key for the table that allows this optimization.

I also tested to do a manual BINARY SEARCH and start looping from that position until i found a different combination for my key fields, this is a solution i usually used when i didn't understand how ABAP works, i surprisingly found that it performs slightly better, may be useful if you need to squeeze until the last bit of performance.

You can find full code at this github repository: https://github.com/jordirosa/ABAP4Newbies

Regards.

keremkoseoglu · ‎02-06-2020

You might be interested in the following performance test where Binary Search is compared against alternative approaches in nested loops: https://keremkoseoglu.com/2015/07/09/abap-nested-loop-performance/ .

Long story short: Data types within internal tables seem to affect the performance outcome. But in average, Binary Search seems to be the slowest method (compared to Hashed Table, Paralel Cursor and Sorted Table).

Sandra_Rossi · ‎02-06-2020

Maybe you can add a README to your repository to explain that people may install it with abapGit (newbies are probably not aware of its existence).

michael_piesche · ‎02-06-2020

When it comes to table access, you always have to know the data and the number of accesses, in order to make a well informed decision about what type or better what search algorithm to choose. And for each search algorithm, in almost all cases, there is also a ‘sorting’ algorithm that has to be performed before hand.

Examples:

You have an unsorted table with 1’000’000 (1 million) entries and you are looking for just one (!) specific entry
=> dont bother with sorting and using binary search or building buckets and using a hash table.
Because this will allways take at least O(n+1) time, where n is the number of items in your table.
Where as the sequential search will only in the worst case give you a performance of O(n) time.
You have two tables with corresponding data and you need to access all entries (!) and map the records from table one to those of table two
=> In this case, a parallel cursor will be the fastest, where both tables need to be sorted first
If n and m are the sizes of the two tables, then sorting the tables will take O(nlogn) and O(mlogm) time in the worst cases, but only O(n+m) time to do the nested loop
The only problem I have with the parallel cursor, is that you have to implement and test it yourself, because it does make a difference on the implementation if you have a 1:1 or a 1:n match, whereas a sorted or hash table search does that for you already and looks a little cleaner
For anything in between I would most likely start out with a sorted table first, because I also might need it sorted or I might constantly add and delete records. But if I dont need any of those two functions beside the search and I realize it could perform better, than I would switch to hashed.
I very seldom would do a binary search on a standard table that has been sorted seperatly, because you and other developers always need to be aware of this, and also adding items to your table always requires a resort

Update:

jordirosa in those more "complex" examples, you could also try out having not only primary keys but also multiple secondary keys and whether the table is sorted or not, and also see what this does in terms of performance to not only search but also insert and delete functions.

And keep a binary search statement solely to a table type that is not sorting by it self, but has to be sorted by coding. Using the binary search statement on a sorted table type with non- or unique keys can lead to wrong results, if you dont access it by its keys.
Altough the binary search statement on a sorted table is pretty much the same as a search by primary or secondary key on a standard or sorted table, there still can be some differences in performance, especially insert and delete, due to overhead in indizes but also how SAP has internally programmed those functions.

former_member571549 · ‎02-06-2020

Thanks for sharing the article, is really interesting, i will take a look to more posts in your web.

former_member571549 · ‎02-06-2020

Coudn’t agree more, i will update the repository.

Thank you 🙂

former_member571549 · ‎02-06-2020

Good explanation, maybe i'll write another post with some more complex examples so i can get more feedback.

Thanks for sharing your knowledge.

Jelena · ‎02-06-2020

I very seldom would do a binary search on a standard table that has been sorted separately, because you and other developers always need to be aware of this, and also adding items to your table always requires a resort

Bingo. SORTED table type is already using binary search algorithm. I've ran this through different scenarios before and, like kerem.koseoglu , arrived at a conclusion that on average, SORTED table performance is not worse than when using BINARY SEARCH. Interestingly, I ran the same program probably 30 times in a row and the result fluctuated by a single-digit percent between SORTED and BINARY SEARCH option. Go figure.

But the main thing is: it's not just about the performance. BINARY SEARCH can offer a marginal improvement in runtime while it's effect on the code maintenance and "cleanliness" is far worse. If you forget to sort (or re-sort) the table, you could simply get a wrong result.

For this reason, when working on the 2nd edition on the ABAP Introduction book, we have revised the section on BINARY SEARCH to advise against using it. We still had to explain it, but made it clear this is not a recommended solution anymore for multiple reasons.

Thank you!

former_member571549 · ‎02-07-2020

I was planning to write about the problem you mention, i wanted to show the debugger option that checks that a table is sorted before BINARY SEARCH, i know it's not perfect because if your table is sorted just by chance the check will be ok, but this option can help reduce mistakes.

beyhan_meyrali

Hi,

It is safer to add binary search conditions to loop where, otherwise you may end up reading other records which satisfies index and other where condition parameters.

  START_MEASUREMENT 'BINARY SEARCH SORTED TABLE'.
  LOOP AT li_head.
    READ TABLE li_positions_sorted WITH KEY docnr = li_head-docnr
                                            gjahr = li_head-gjahr
                                   BINARY SEARCH.

    IF sy-subrc = 0.
      LOOP AT li_positions_sorted FROM sy-tabix
where docnr = li_head-docnr and gjahr = li_head-gjahr.
        IF li_positions_sorted-docnr <> li_head-docnr OR
           li_positions_sorted-gjahr <> li_head-gjahr.
          EXIT.
        ENDIF.
      ENDLOOP.
    ENDIF.
  ENDLOOP.
  END_MEASUREMENT 'BINARY SEARCH SORTED TABLE'.

Sandra_Rossi

beyhan_meyrali Sorry but again you don't understand what an internal table of TYPE SORTED is:

DATA li_positions_sorted TYPE SORTED TABLE OF lt_position WITH NON-UNIQUE KEY docnr gjahr posnr.

TYPE SORTED TABLE means that you don't need to use FROM sy-tabix in LOOP AT, and so you don't need READ TABLE ... BINARY SEARCH (also it's non-sense to indicate BINARY SEARCH with an internal table declared TYPE SORTED TABLE, if you prefer you can add a comment to indicate it's an internal table TYPE SORTED but that's all you need to do).

I hope you can read the ABAP documentation and understand how TYPE SORTED TABLE works.

beyhan_meyrali

Hi Sandra,

Let me clarify,

With binary search you find the index, and then pass it to loop and it is best to add key fields to loop where condition, since loop will start from tabix, it will only make your code safer without performance lose.

I hope it is a bit clear for you. Maybe you should also read and try to understand before you label me :).

Sandra_Rossi

To other people who read the blog posts of SAP people, experts and ABAP documentation: with an internal table of TYPE SORTED, LOOP AT is using implicitly a binary search if you indicate the first key fields as explained here itab - Optimizing the WHERE Condition - ABAP Keyword Documentation (sap.com), especially "Prerequisites for the Optimization of Sorted Keys", none of the examples use READ TABLE ... BINARY SEARCH.

Also, SAP explicitly recommends to NOT USE READ TABLE - BINARY SEARCH, instead use sorted tables (in the sense of declared with TYPE SORTED TABLE OF) :

"Instead of using the addition BINARY SEARCH, it is recommended that work is done either with sorted tables or with sorted secondary table keys."

ABAP for newbies - Importance of BINARY SEARCH

Get Started with the ABAP Development Tools for SAP NetWeaver

Become an ABAP in Eclipse Feature Explorer and earn the Explorer Badge

Six kinds of debugging tips to find the source code where the message is raised