Shrink your Tables with SAP HANA SP08

abanip · ‎10-16-2014

Shrink your Tables with SAP HANA SP08

Abani Pattanayak, SAP HANA COE (Delivery)

Jako Blagoev, SAP HANA COE (AGS)

Introduction:

Yes, with HANA SP08 you can significantly reduce size of your FACT tables significantly. Depending on the the size of the primary key and cardinality of the dataset, you can get significant (up to 40%) savings in static in static memory usage.

This savings in memory is compared to HANA SP07 or earlier revisions.

So what's the catch?

There is no catch.

The saving is based on how the primary key of the table is stored in HANA. Please check the biggest FACT table in your SP07 or SP06 HANA database, the size of the primary key will be around 30 - 40 % of the total size of the table.

With HANA SP08, we can eliminate this 30 - 40% memory taken by the primary key. So there is no negative performance impact on query performance.

Show me the Money (What's the trick)?

You need to recreate the primary key of the table with INVERTED HASH option.

CREATE COLUMN TABLE "SAPSR3"."MY_FACT_TABLE"(

"STORID" NVARCHAR(10),

"ORDERID" NVARCHAR(15),

"SEQ" NVARCHAR(10),

"CALMONTH" NVARCHAR(6),

"CALDAY" NVARCHAR(8),

"COUNTRY" NVARCHAR(3),

"REGION" NVARCHAR(3),

..

PRIMARY KEY INVERTED HASH ("STORID",

"ORDERID",

"SEQ",

"CALMONTH",

"CALDAY",

"COUNTRY",

"REGION"))

WITH PARAMETERS ('PARTITION_SPEC' = 'HASH 8 STORID')

;

You can use ALTER TABLE command to drop and recreate primary key of the table.

However, if you have a scale-out system or a really BIG fact table with billions of record, We'd highly recommend to create a NEW table with INVERTED HASH Primary Key and then copy the data over to the new table. Then rename the tables.

Result

The following is the result of updating the primary key in a customer project. As you see below, the saving in static memory is around 531GB over 1980GB.

So overall, there is a saving of at least 2-nodes (0.5 TB each) in a 9-node scale out system.

The best part of this exercise, there is no negative performance impact on query performance.

Note: I'd suggest you review your existing system and evaluate if you can take advantage of this feature.

lbreddemann · ‎10-16-2014

Nice to know that the feature that had been lurking since SPS 6 now seems to be officially supported.

Did you happen to perform some performance tests on the hashed concat primary key?

I would guess, that range scans could possibly be a bit slower, since the hashes likely would be in the same order than the original values.

In case there are no negatives and the hashed concat is always better than the inverted concat primary key, why is this not the default setting? :smile:

Former Member · ‎10-18-2014

OK I'll bite :smile:

Why do you have a primary key on a fact table? Kimball discusses it here:

http://www.kimballgroup.com/2006/07/design-tip-81-fact-table-surrogate-key/

I don't see how a surrogate primary key on a fact table would help with HANA.

Former Member · ‎10-19-2014

Good point - even for traditional databases SAP BW never used a primary key / primary index on fact tables (/BIC/F*, /BIC/E*). On the other hand the INVERTED HASH approach should also help in other scenarios.

abanip · ‎10-20-2014

The size of the composite primary key (when it is not inverted hash) can be significantly reduced with a surrogate primary key even in SAP HANA.

Other than that I don't see any other reason of surrogate primary key in FACT table.

abanip · ‎10-20-2014

Agreed. But SAP BW DSOs use composite primary key and in those scenarios the savings could be significant.

However, I'd be careful touching BW underlying tables without due diligence

Former Member · ‎10-20-2014

Yes but why have a primary key at all?

abanip · ‎10-20-2014

The purpose of the PRIMARY KEY is to make sure uniqueness (set of columns) of the data record in the table

Now depending on how the surrogate primary key is generated (e.g. if it based on some kind of hash function of the composite columns), this can ensure uniqueness of the record.

I believe these are constraints specific to application requirement and can vary from system to system.

abanip · ‎10-20-2014

I think it is probably too early to say there are NO negatives for all use cases. For our scenario, it worked out pretty well. I believe this is going to be further tested in coming months by SAP applications (BW, ERP etc.). So depending on the results, it may (not) be the default setting for certain cases.

lbreddemann · ‎10-20-2014

But you wrote that "there is no catch"... :???:

BTW: PM announced that the feature will only be released with SPS9...

ryosukemouri · ‎11-05-2014

Just to let you know our SUM process failed after converting REPOSRC to inverted hash key. Not sure the reason why but be careful.

[9480]{200040}[36/52815066] 2014-11-05 03:39:01.159545 i TraceContext TraceContext.cpp(00718) : UserName=SAPSID, ApplicationUserName=DDIC, ApplicationName=ABAP:SID, ApplicationSource=RSINCGEN:560, EppRootContextId=000001425F4E414D453E3C504152414D, EppTransactionId=E59464E428A3F10F848900505695392B, EppConnectionId=00000000000000000000000000000000, EppConnectionCounter=0, EppComponentName=SID/iwdfvmXXXX_SID_02, EppAction=RSINCGEN

[9480]{200040}[36/52815066] 2014-11-05 03:39:01.159526 e table_update TableUpdate.cpp(09808) : convert() concatenated old key values of table 'SAPSID:REPOSRC' failed

[9480]{200040}[36/52815066] 2014-11-05 03:39:01.159599 e table_update TableUpdate.cpp(01734) : rollback transaction because rc=1513 is set

abanip · ‎11-05-2014

Can you please explain your SUM process? Are you getting this error while executing a Query with SUM(X) on the REPOSRC table?

Former Member · ‎11-05-2014

Abani - he meant Software Update Manager, I think. REPOSRC is the ABAP code repository.

Ryosuke, please don't change system generated tables within the SAP BW/ERP environment. If the SAP dev team are ready to do that, they'll do this themselves :smile:

abanip · ‎11-06-2014

As John pointed out. please do not touch the SYSTEM generated tables.

@John - Thank you

Former Member · ‎11-09-2014

Hi Mouri, What HANA Version you are running on ?

http://service.sap.com/sap/support/notes/1850112

As per this note REPOSRC is a column store

In 7.40 Support Package 5, the storage type is changed to column store for the following tables:

EDIDC

EDIDS

INDX

REPOSRC

So if your HANA is a lower version, It must a ROW Table but no Column Table.

I will only change the index type if you see a significant space use of index space, other wise I will not attempt to change to inverted hash,but it is easy to change during the migration, so my question is what indexes are defined on REPOSRC Table?

Thanks

Srikanth M

lbreddemann · ‎11-09-2014

Sorry, but you're mixing up different things here:

The inverted hash is one of two possible implementations for a primary key of a column store table.

Indexes are a different object and not connected to the inverted hash implementation.

It's correct that it is generally not supported to change table storage parameters like ROW/COLUMN for SAP delivered tables, however constructing a technical reason for the error message out of it is a mistake.

SAP product management announced the inverted hash primary key implementation to be available as of SPS9, no it's definitively nothing to put in production in an earlier release (at least not without specific support for this by SAP).

As with most features, it's highly recommendable to understand and try out features first (on a test system that can be rebuilt at every point in time without any negative impact).

Concerning the error message itself: I'd consider this a bug and recommend to have a support incident opened for this. There's no conceptual reason why any ABAP process should have issues with the way that a primary key constraint is implemented on SAP HANA.

Let's be very clear here: this is a relatively new feature and there's no recommendation by SAP HANA development to go and implement it (yet).

A much better option to save on storage space is to get rid of concatenated attributes that had been created for joins in pre-SP8 versions.

cheers,

Lars

Former Member · ‎11-19-2014

Hi - just checking whether "INVERTED HASH" is also supported for design-time tables - i.e. a table created via generating a .hdbtable file and activating? All of our tables are created using this method, and we do have some rather large primary keys we would like to test this out on, but I am unable to determine the corect syntax to use in this case, as the primary key definition using hdbtable file looks very much unlike the "create table" as referenced in the example above. Something like:

table.primaryKey.pkcolumns = ["DAY_KEY","DC_KEY","ITEM_KEY"];

Any pointers would be appreciated.

Thanks

-Trevor

lbreddemann · ‎11-19-2014

Hi Trevor,

please wait for the feature to be fully supported in SP9.

- Lars

justin_molenaur2 · ‎02-10-2015

This is where I am kind of thinking too...

In some customers, SLT pretty much already handles the uniqueness in a table, already deciding whether the new data is an insert, update or delete and issuing the appropriate command once it gets to the HANA box. So for sidecar scenarios using SLT, given that performance is the same, wouldn't it be reasonable to eliminate the PK altogether?

If I get some time maybe a test is in order :smile:

Regards,

Justin

abanip · ‎02-10-2015

That's certainly a possibility as long as the data-integrity can't be maintained. (in this scenario SLT will take care of that). However, if you have to evaluate what's the saving you are having by eliminating the primary keys. Using inverted primary key, you are getting pretty significant savings already.

lbreddemann · ‎02-10-2015

Sorry guys,

just out of interest:

how does SLT "take care" of the consistency here?

Sure, the happy path of code execution won't likely lead to duplicate entries.

But how would you even know about duplicates when there's no primary key constraint in place?

While it's surely not necessary to have the PK on the replication to keep the single true record consistent (that happens in the source system) it would be still a nice thing to know when the data is not consistent in itself anymore.

And SLT did not have an "auto-fix" function, last time I looked...

As for the fact tables: correct, there is no actual need to have a primary key here as we can deal with duplicates during the aggregation.

- Lars

justin_molenaur2 · ‎02-11-2015

Here we are again gentlemen, good stuff.

-->how does SLT "take care" of the consistency here?<--

I think that might the wrong term. SLT "SHOULD" deliver data to the target table that will not violate the existing constraints of the source table - which of course the DDL should also match. That has nothing to do with actions that can happen outside that replication process, so technically SLT does not "take care" of consistency in the target table, I agree with you on that Lars. There should however, be reasonable assurance that without a PK, SLT should (there's that word again) act as a proxy for constraint handling.

SLT doesn't have an auto-fix function, but I believe there is some functionality I have heard of where you can do a complete table compare between source and target, maybe tobias.koebler would have more detail on that one.

-->But how would you even know about duplicates when there's no primary key constraint in place?<--

That's the hard part and major risk of the trade off. The entire context of this original post was to minimize the amount of memory used for indices. My interest on behalf of a customer was a similar thing, but I was thinking even farther than the hash conversion.

To give an example, I took a standard SLT replicated table, copied it and converted the PK index to hash and then also took another copy and straight away dropped the PK altogether. Ensuring it was all loaded to memory, these are some of the results I got from a memory size perspective. For reference, this table was about 900k entries.

So similar to Abani's results, I saw a 35% reduction in index size with the hash conversion. Taking it further and dropping the PK took this to 68% reduction, or a total size that was 3.1x smaller than the original. That would be fairly significant with larger volumes.

For the performance angle, I did a single key search (WHERE "MATNR" = 'X'), range search (WHERE "MATNR" BETWEEN 'X' and 'Y') and inlist search (WHERE MATNR IN ('X', 'Y') to see what we got.

Surprisingly, all three methods performed more or less than same at a consistent 15ms response time. Now, given that it's such a low response time and table is less than 1m records, it needs to be scaled up to give more accurate results. Just thought this was an initial look.

Now, I am not advocating or even SUGGESTING that PK's be dropped, this was simply an investigation on size savings and trade off with performance. The constraint benefit that a PK brings doesn't factor into the two above topics. The only reason we get into this is because HANA doesn't allow you to define a PK WITHOUT these indices.

Happy HANA,

Justin

Former Member · ‎02-11-2015

I had similar results at a customer where we had a 1TB cluster table for pricing conditions. I rebuilt the table with a hash PK index and it was 35% smaller.

Now be very careful with table access times, because HANA's single-user access times are not a good representation of access cost. In single-user mode, you don't know how much of the NUMA memory throughput of the appliance is being used.

What you really need to do is load test the system with many concurrent threads. This will give you an idea of the attribute vector scan cost. With range partitions and a hash PK on just one table, we increased aggregate SD throughput by 3x, despite having very similar (1ms) single user access times.

Former Member · ‎03-24-2015

Greetings . . .

With the release of SPS9 we now have a KB article discussing the feature:

http://service.sap.com/sap/support/notes/2109355 and I have seen an SAP AGS TPO report recommending the feature as well for BW on HANA.

In the KB article it discusses the potential downsides (everything is trade off) of switching to Inverted Hashes - in particular hash collisions. What is the impact on the system if a table has a large number of hash collisions - additional performance impacts? Less memory savings? The KB Article points out http://service.sap.com/sap/support/notes/1969700 for SQL to detect collisions. But neither discusses what should be done when a table has a 'large' number.

Any insights would be greatly appreciated.

Keith.

lbreddemann · ‎03-24-2015

Hey Keith.

Yes, hash collisions are a performance penalty and a rather minor additional space impact.

Since you cannot really change anything about the hash collisions, the information about the HASH collisions is really more of informative nature right now.

It might be the missing piece of explanation why a certain primary key access is slower than expected.

To the point of "what large means": this completely depends on the hashing function used internally here. At some point hashing functions tend to create collisions - but I don't know the actual hashing function used or the threshold for collisions to be expected.

Recommending the usage of the feature in very well known scenarios like BW is definitively a good idea in order to save space.

I know, this probably aren't the answers you were looking for, but that's what I can say to this.

- Lars

Former Member · ‎10-28-2015

Hi Lars,

Any news on the hdbtable syntax to specify inverted hash for the primary key?

As of HANA SPS10_102 documentation, inverted hash is not mentioned in the hdbtable syntax reference.

Thanks,

Daniel.

lbreddemann · ‎10-28-2015

Not that I would know of.

Better ask folks like thomas.jung2 or somebody else from the SAP HANA PM about that.

There's a SAP internal JAM page you might want to use for that.

- Lars

martin_chambers · ‎12-22-2015

Hi Lars,

what's the current official situation in SAP BW? In SPS9, SPS10, SPS11 or SPSXY?
Is the inverted hash going to be the default setting or merely an SAP sanctioned alternative that you have to implement manually for every datamart?

Cheers,

Martin

christoph_gilde · ‎10-16-2017

As of October 2017: still not supported in BW, and several other restrictions