Technology Blogs by Members
Explore a vibrant mix of technical expertise, industry insights, and tech buzz in member blogs covering SAP products, technology, and events. Get in the mix!
cancel
Showing results for 
Search instead for 
Did you mean: 
yverma
Explorer
0 Kudos

First of all overview of Logistics Regression method which we are going implement using SAP Data Intelligence. Logistic Regression is statistical method and was used in the biological sciences in early twentieth century. It was then used in many social science applications. Logistic Regression is used when the dependent variable(target) is categorical.

For example:-

To predict whether an email is spam (1) or (0)

Whether the tumor is malignant (1) or not (0)

Consider a scenario where we need to classify whether an email is spam or not. If we use linear regression for this problem, there is a need for setting up a threshold based on which classification can be done. Say if the actual class is malignant, predicted continuous value 0.4 and the threshold value is 0.5, the data point will be classified as not malignant which can lead to serious consequence in real time. The following article shows how to implement Logistics Regression statistical method on the dummy data set using SAP Data Intelligence.

UPLOAD THE DATA IN S3 BUCKET

From SAP DI Launchpad go to the connection management.

yverma_2-1711018620457.png

Go to the icon for creating the connection.

yverma_3-1711018662276.png

Select S3 as connection type
Give the details of the connection in the Data Intelligence Connection Management window.

yverma_4-1711018689257.png

 

When we click on the check status, it tells about the status of the connection.

yverma_6-1711018737139.png

 

Open metadata explorer to upload the data in AWS S3 Bucket.

yverma_7-1711018780602.png

 

In metadata explorer, go to the Catalog and then select Browse connection.

yverma_8-1711018802973.png

 

Browse Connections window is open as shown in the screenshot.
Select S3 Cube Connection to upload the data in S3 Bucket.

yverma_9-1711018887676.png

Then we have to open our directory.
We have to upload our DATA files by clicking on icon .

yverma_10-1711018938456.png

yverma_11-1711018952699.png

Select the file which we want to upload in S3 bucket and click on upload.

yverma_12-1711018975877.png

After uploading the files, it shows the Upload Complete status in green colour.

yverma_13-1711018991547.png

The file is successfully uploaded in S3 Bucket.

 

Build the Pipeline in the Modeler Tile

Go to Modeler tile and open the modeler window. There would be 5 tabs Graphs, Operators, Repository, Configuration Types, Data Types.

yverma_15-1711019079349.png

 

Firstly, we have to create a graph – Click on the ‘+’ icon on graphs tab.
Once the graph is created, search for READ FILE operator in Operators tab.

yverma_16-1711019118524.png

 

To access AWS S3 files in Pipeline we can use the Read File operator which allows access to S3 directly.
Drag and drop the READ FILE operator on the canvas.

yverma_17-1711019146188.png

 

Search Wiretap in the Operators tab.
Drag and drop the Wiretap in the graph area.

yverma_18-1711019167265.png

 

Input port (ref): message.fileReference
A File Reference pointing to the file to be read. If the reference is a directory, nothing is done and no output is produced.
Output port (file): message.file
A File whose contents may be presented as a whole or in batches, according to the operator configuration.
Output port (error): message.error
An Error Message, in case an error was raised during an operation.
Here, we connect the Read File message.file port with the Wiretap.

yverma_19-1711019185714.png

In the connection configuration select “Connection Management” in configuration type
For connection ID select s3Cube

yverma_20-1711019254973.png

 

We have to select the path in the Read File configuration.

yverma_21-1711019284213.png

 

We can check the configuration of Wiretap.

yverma_22-1711019305506.png

 

Now, we can save all the operation of the previous steps.

yverma_23-1711019324487.png

 

Once the graph is running, click on the running instance of the graph and click Open UI to see the output in wiretap.

yverma_24-1711019340024.png

 

Here, we can see the output of the data in the Wiretap.

yverma_25-1711019354945.png

Using PANDAS IN THE PYTHON OPERATOR FOR Data Wrangling

The Modeler window is open.
Go to the Operators tab and search Python3 Operator.
Drag and drop Python3 Operator in the graph area.
After dragging, right click on the Python3 Operator and select Add Port for adding the input and output ports.

yverma_0-1711067530906.png

Here, we have to add the input and output port for taking input and output

yverma_1-1711067539477.png

 

yverma_2-1711067616559.png

The following is the depiction of the pipeline and ToString convertor has been used to convert data into string

 

yverma_3-1711067637272.png

Inside the python operator the data manipulation is performing which requires PANDAS library

yverma_4-1711067748014.png

Output of the Pipeline execution:

yverma_5-1711067782949.png

UPLOAD THE SAME DATA INTO LOCAL DI DATALAKE THROUGH DATA MANAGER

yverma_6-1711067860555.png

We have to create the Data Collection by clicking the Create button.

yverma_7-1711067908939.pngyverma_8-1711067924197.png

In this Meta Data explorer we have to upload the data in the DI Datalake.

yverma_9-1711068008969.pngyverma_10-1711068018126.png

CREATE THE JUPYTER ENVIRONMENT

yverma_11-1711068036168.png

yverma_12-1711068049207.png

Create ML Scenario by clicking + icon

After creating the ML Scenario we have various sections like Datasets, Notebooks, Pipelines, Executions, Models, Deployments.

yverma_13-1711068089414.png

For creating the Jupyter Notebook, click + icon of Notebooks section

yverma_15-1711068157157.pngyverma_16-1711068168412.png

Exploratory Data Analysis using Juypter Notebook on the dataset which we have uploaded above

yverma_17-1711068255206.png

Open Jupyter Notebook

yverma_18-1711068274585.png

Go to the Data Browser for selecting our Workspace.

yverma_19-1711068299235.pngyverma_20-1711068313375.png

Open our data workspace

yverma_22-1711069225050.png

 

After open the data workspace, Copy code snippet to clipboard and paste in the Jupyter Notebook cell.

yverma_23-1711069237512.png

Create a new Kernel in the new environment for installing the libraries in an isolated manner.
Open the launcher and go to terminal.

yverma_24-1711069303805.png

 

yverma_25-1711069350321.png

Now, the kernel is successfully created.

yverma_26-1711069367077.png

 

Select new Kernel for the Jupyter Notebook.

yverma_27-1711069380342.png

Now, We would do Exploratory Data Analysis in Jupyter Notebook.

yverma_28-1711069403038.png

 

Check the distribution of age among passengers.

yverma_29-1711069417886.png

Here, we are plotting the correlation matrix using heat map to identify the correlation between features.

yverma_30-1711069430798.png

Now, we are training a LogisticRegression model to classify if the passenger survived or not.
Once the model is trained we check the model accuracy

yverma_31-1711069462033.png

BUILD THE TRAINING PIPELINE AND SAVE THE MODEL

yverma_32-1711069470542.png

Creating the training pipeline.

 

yverma_33-1711069580325.png

 

Here, we give the name and select the Python Producer template.

yverma_34-1711069612508.png

In this Python producer graph (Training Pipeline), first we group Python3 operator. After that we need to tag the docker image file

yverma_35-1711069684688.png

 

Script of the Python operator

yverma_36-1711069702740.png

 

Save the pipeline and run the pipeline. After running the pipeline, we have to check the status.
The artifact producer saves the model pickle file in Semantic Data Lake, for further use

yverma_37-1711069783670.png

CREATE THE INFERENCE PIPELINE FOR THE SAVED MODEL
TEST IT WITH THE POSTMAN APP TO SEE IF WE ARE GETTING THE OUTPUT

Reference:

https://community.sap.com/t5/technology-blogs-by-sap/di-basics-building-a-custom-dockerfile/ba-p/135...

https://help.sap.com/docs/SAP_Best_Practices/8c92d5da091847f8bc1f1b319f3df70a/c7bf2bd9fa6b4ddaad46a4...

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Labels in this area