cancel
Showing results for 
Search instead for 
Did you mean: 

Replicating large SAP tables to AWS S3 using SAP Data Services

najari
Participant
0 Kudos

Dear SAP data and AWS S3 experts,

Replicating table ACDOCA from our S/4 HANA System to our Datalake (AWS S3 Bucket) using SAP Data Services is not working.
While SAP Data Services is capbable of replicating large tables from an SAP Source System (70 GB => Approx. 8 hours for the replication job to conclude) AWS S3 has a limitation of 5 GB to be uploaded in a single put. See: https://aws.amazon.com/s3/faqs/#:~:text=Individual%20Amazon%20S3%20objects%20can,single%20PUT%20is%2....

Replicating the same table to a flat file on my local machine worked.

Eventhough this seems to be an issue on AWS S3 side, I wonder if anyone of you have experienced the same and implemented a solution with SAP Data Services e.g. Table split.

I look forward to hearing your thoughts and ideas!

Best regards,

Saleem

View Entire Topic
Nawfal
Active Participant

Hi Saleem,

You would need to split the large output file into 5GB size or smaller and later transfer them to the S3 bucket. It's going be easier if you can leverage external OS commands like split in Linux to achieve that, otherwise you can split the table to produce multiple files. One suggestion to do is to add a Row_Generation to the query transform and use a while loop to produce a new file with a given number of records for each iteration until the total records count is completed.

Thanks

Nawfal

najari
Participant
0 Kudos

Thank you, Nawfal!

We are working right now on splitting the table into multiple smaller files, we would of course prefer to avoid this solution, because it means constant maintenance of the job (keep on adding new target files) and then merging these files in the target system (more processing in aws s3 means higher costs).

According to SAP Data Services multipart uploads for files larger than 5 MB v´by default: https://help.sap.com/docs/SAP_DATA_SERVICES/af6d8e979d0f40c49175007e486257f0/a611106693ea422eb0b0470...

However If this was true, we wouldn't have faced this issue in the first place.

We are now suspecting a timeout limitation on Data Services' end for uploading CSV files on the target system.

Do you happen to know, whether it's possible to increase the timeout limit for target file uplaod?

Best regards,
Saleem

Nawfal
Active Participant
0 Kudos

Hi Saleem,

I'm not sure about other timeout options that you can increase other than the one (Request Timeout) available in the S3 file location options. I would suggest you check the job trace log for timeout indications.

You could perform some tests and try uploading large files using AWS CLI and see how the transfer behaves outside of BODS and on your network. You can also check if files are compressed before the copy to S3 which reduces the file size and the bandwidth a lot.

In regards to the table split in Designer, the job wouldn't need any maintenance once built. You only need one target with changing filename that is set in a global variable. The file generated will have a different name as you loop through the table split.

You need to keep exploring various options until you find the optimum solution that serves your requirement.

Thanks

Nawfal