Technology Blogs by Members
Explore a vibrant mix of technical expertise, industry insights, and tech buzz in member blogs covering SAP products, technology, and events. Get in the mix!
cancel
Showing results for 
Search instead for 
Did you mean: 
siva_prakash
Explorer
Hi, SAP community!

I'd like to present a proof of concept I undertook, focusing on a particular scenario in Datasphere.

Problem Statement:

Consider this scenario where we need to store the data from 1000+ Excel files in SAP Datasphere. With the current features available in SAP Datasphere as of version 2023.21. Data from CSV files can be uploaded into a local table by going to Databuilder>Import CSV File.

However, the drawback of this method is that it has a size limit of around 200 MB and the files can not be queued and have to be uploaded one at a time. Considering the number of files, and depending upon the type of transformations & cleansing this can quickly become a daunting task.

Solution:

I embarked on a proof of concept (POC) to automate this mammoth manual task. Leveraging AI solutions like ChatGPT & BingAI, I built a pipeline that encompasses the following steps:

Steps:

  1. Initial Upload: Gather the required CSV files and upload them to Google Drive.

  2. Mounting the Drive to Collab: Use Python to mount your Google Drive in Google Colab.

  3. Data Transformation with Python: Run a Python script in Google Colab to process and cleanse the data. Save the results as new Excel files in a designated Google Drive folder.

  4. Upload into Google Cloud Storage: Set up an account on the Google Cloud platform and move the transformed files into a bucket in Google Cloud Storage (GCS)

  5. Integration with Google BigQuery (BQ): Transfer data from GCS to a structured table in BQ.

  6. SAP Datasphere & BQ Connection: Set up a BigQuery connection in Datasphere.

  7. Dataflow: Push data from BQ to a local table in Datasphere using the BQ source connection in Dataflow.

  8. Dashboard: Craft a story in SAC to visualize the processed data.


Architecture Diagram:


High-level architecture of automating flat file upload into Datasphere using AI, Google GCS, and BQ


Deep Dive into the Process:

  • Python coding in Collab with AI Tools Support: I automated data transformations for a multitude of Excel files by leveraging Google Colab and the powerful Python IDE and its libraries. With the assistance of AI tools like ChatGPT and Bing AI, the Python script was developed, tested, and refined incrementally. I validated the code initially with a smaller dataset and eventually backtested with larger datasets and did a few error handling.

  • Google Storage Solutions & BigQuery Integration: I used the advanced capabilities of GCS to house the transformed files. Initially, I tried fetching data directly from Google Cloud Storage to SAP Datasphere. However, this method required creating a projection for every file in our bucket. To circumvent this, I decided to structure the data in Google BigQuery using the data from the GCS bucket.

  • Connecting BQ to Datasphere: With the data aptly structured in BQ, I then integrated it with SAP Datasphere, ensuring a smooth and accurate data transition.


Conclusion:

With strategic AI collaboration, we transitioned from a labor-intensive process to an automated, efficient solution. Embracing AI tools, rather than viewing them as competition, can amplify human capabilities, leading to groundbreaking solutions.

I hope this blog offers a clear perspective on integrating SAP with non-SAP solutions. By adopting such methodologies, you might find ways to enhance your development process and potentially save time.

Hope this article was helpful.

For further insights, please refer to these articles:

Create Google Cloud Storage Bucket

Prepare Connectivity to Google BigQuery

Creating a Google Big Query connection in SAP Datasphere (DWC)
4 Comments
AravindR
Explorer
Really insightful and brings a new perspective!
siva_prakash
Explorer
0 Kudos
Thanks Aravind Rajendran, glad you find it helpful.
siddhesh_kadam
Explorer
Can you please help on below 2 points:

1. Is the pulling of files from Google drive to Collab automated in someway ? i mean if we upload more files in drive will it automatically pull and run the Python code ? if not is there a way we can achieve it ?

2. Also the movement of files from Google drive to GCS is that automated if not any idea how can we do that ?

Also this was really good read and insightful which gave new perspective , Thanks for Sharing !

 
siva_prakash
Explorer
0 Kudos
Hi Siddhesh,

Surely, thanks for posting your questions. I will provide the answer to your questions below:

1. The data from Google drive can be obtained from Google collab with the help of a simple python code which will help us mount the drive.
from google.colab import drive
drive.mount('/content/drive')

After mounting, specify the folder from which you wish to retrieve your data into collab and perform your necessary logic.
# Import necessary libraries
import os

# Specify the path to the folder in your Google Drive
folder_path = '/content/drive/My Drive/Your_Folder_Name'

#Perform the necessary operations

2. In my case, I downloaded the transformed files from google drive and uploaded directly into Google Cloud Storage via drag and drop. It was straight forward and simple. The same should be possible with code and can be automated if needed.

Hope it helps:)
Labels in this area