Creating a predictive analytic application using R...

danishmeraj · ‎10-13-2022

In my previous blog, I discussed an algorithm for creating a risk prediction tool by combining machine learning and an aggregation algorithm. In this blog, I will take a different approach to creating a predictive analytic application, leveraging the R interface in the SAP analytics cloud. You can follow along with me in this blog to achieve similar results.

Data: The data used for this demonstration is publicly available and easily accessible. You can download the data from here.

You can also read the blog here to get the basic information regarding multiple linear regression. Additionally, it will provide the background of the data used in this demonstration.

Data Model: Let's start by creating a data model using the dataset. In this step, the data model is created in the SAC modeller. We will add a column, i.e., the date dimension, using the calculated column formula. It will allow us to create a planning model.

Note: It is not mandatory to have a planning model to create an analytic application. I chose a planning model, thinking if, in future, I decided to write a blog about making the analytic application demonstrated in this blog more interactive, this blog would still be relevant.

Figure 1: The figure shows the data in SAC modeller; Source: Author’s own illustration.

Configuring R widget: In this step, the data source is configured in the R widget as shown in the below figure:

Figure 2: The figure shows the input data in R visualization widget; Source: Author’s own illustration.

Scripting in R: After configuring the data source, we will leverage the scripting capability of R to train a machine-learning model.

First, we will load some of the required packages to our R environment and create a data frame of the source data. as shown in the below code snippet.

library(ggplot2)

library(dplyr)



df<- HeartData

head(df)

summary(df)

After that, we will check the correlation between different variables. The code snippet below shows the step to perform the correlation.

#Check correlation between two independent variables

cor(df$biking, df$smoking)

#Histogram for heart disease

hist(df$heart.disease)

For reproducing the results, I have set the seed to 1. It helps in reproducing similar results, as shown in this blog.

The next step is data partitioning. The data is divided into two parts. i.e., Training and Testing. In this example, 70% of the dataset is used for training and 30% of the data for testing purposes.

#make this example reproducible

set.seed(1)

sample <- sample(c(TRUE, FALSE), nrow(df), replace=TRUE, prob=c(0.7,0.3))

train  <- df[sample, ]

test   <- df[!sample, ]



#Training

heart.disease.lm<-lm(heart.disease ~ biking + smoking, data = train)



#Summary of Training

summary <-summary(heart.disease.lm)

summary



#Prediction using the test data

heart.disease.predictions <- predict(heart.disease.lm,test)

After training the model on the training dataset, the model is tested to make a prediction using the test dataset. The "cbind" function is used to creates a table of the testing results, which contains actual and predicted values from the testing phase. It gives an overview of the model's performance on the training dataset. as shown in below code snippet.

#creating a result table using column bind function

results <- cbind(heart.disease.predictions,test$heart.disease) #Taking the predicted values and actual values of test data

colnames(results)<- c('predicted', 'actual') #Naming the columns of the result

results <-as.data.frame(results)



#Visualising the result of actual test value and predicted test values

head(results)

The next step is retrieving the parameters from the trained model and using these parameters in the analytic designer environment to leverage this mathematical formula for prediction.

To retrieve the values from the summary of training data, as shown in the figure 3. We need to create a matrix of coefficients from the summary of the training data. It will allow us to retrieve the required parameters from this matrix. After retrieving the parameters, we can save them in a variable that can be accessed from the analytic designer environment.

Figure 3: The figure shows the summary output in R console; Source: Author’s own illustration.

#Creating a matrix of coefficients from training data

matrix_coef <- summary$coefficients 

matrix_coef



#Grabbing the values of coefficents from the matrix

Intercept<-matrix_coef[1,1]

Biking<-matrix_coef[2,1]

Smoking<-matrix_coef[3,1]



#Printing the coefficients

Intercept

Biking

Smoking

Figure 4 shows the parameter values in the R console which will be used in mathematical equation of multiple linear regression to calculate prediction values.

Figure 4: The figure shows the parameters values as output in the R console; Source: Author’s own illustration.

Creating front end for analytic application:

In this step, the front end for the application will be created using widgets such as text, text area, input field etc., as shown in the figure 5 to create an analytic application as shown in the figure 6.

Figure 5: The figure shows the widgets used to create application front end; Source: Author’s own illustration.

Figure 6: The figure shows the application front end and a histogram; Source: Author’s own illustration.

Button_1 OnClick script: It gets triggered when the user clicks the "Predict" button after entering the required input values, such as the Smoking% and Biking%.

//getting the parameter values from R Environment.

var Intercept= RVisualization_2.getEnvironmentValues().getNumber("Intercept"); 

var PBiking= RVisualization_2.getEnvironmentValues().getNumber("Biking");

var PSmoking= RVisualization_2.getEnvironmentValues().getNumber("Smoking");

console.log(Intercept);

console.log(PBiking);

console.log(PSmoking);



//converting the user input values to strings

var Biking =ConvertUtils.stringToNumber(InputField_1.getValue());

var Smoking =ConvertUtils.stringToNumber(InputField_2.getValue());



//Calculating the prediction using Multiple linear regression equation

var formula = Intercept + PBiking*Biking+ PSmoking*Smoking; 



//Rounding off the output to nearest integer

var predictionFormula =Math.round(formula);



//Printing the out to text box

Text_1.applyText("Based on input values the prediction for %people having heart disease in the city is: "+ConvertUtils.numberToString(predictionFormula)+ "%");

You might be thinking, the histogram visualization (fig. 6) is not adding any value to the analytic application then why do we have this unnecessary visualization? Well, unfortunately! It is required to have output in the R visualization widget to pass the environment values to the analytic designer environment.

One solution is to make the visualization widget small (only small enough such that the visualization is still there) and hide it using the shape widget with white background color, as shown in the figure 7. 😄

Let me know if you have other ideas!! Seriously

Figure 7: The figure shows hiding the R visualization widget containing the histogram using the 'Shape' widget; Source: Author’s own illustration.

Testing the prediction tool:

As shown in the figure 8, Our analytic application is now ready. After the user enters the required input values, it will give a predicted value, as shown in the figure 9.

Figure 8: The figure shows the final front end of the analytic application; Source: Author’s own illustration.

Figure 9: The figure shows the predicted values based on the user input; Source: Author’s own illustration.

Complete R Script used in this demonstration:

library(ggplot2)

library(dplyr)



df<- HeartData

head(df)

summary(df)

#Check correlation between two independent variables

cor(df$biking, df$smoking)

#Histogram for heart disease

hist(df$heart.disease)

#make this example reproducible

set.seed(1)

sample <- sample(c(TRUE, FALSE), nrow(df), replace=TRUE, prob=c(0.7,0.3))

train  <- df[sample, ]

test   <- df[!sample, ]



#Training

heart.disease.lm<-lm(heart.disease ~ biking + smoking, data = train)



#Summary of Training

summary <-summary(heart.disease.lm)

summary



#Prediction uisng the test data

heart.disease.predictions <- predict(heart.disease.lm,test)



#creating a result table using column bind function

results <- cbind(heart.disease.predictions,test$heart.disease) #Taking the predicted values and actual values of test data

colnames(results)<- c('predicted', 'actual') #Naming the columns of the result

results <-as.data.frame(results)



#Visualising the result of actual test value and predicted test values

head(results)



#Transfer the values of coefficients to the Analytic designer environment



#Creating a matrix of coefficients from training data

matrix_coef <- summary$coefficients 

matrix_coef



#Grabbing the values of coefficents from the matrix

Intercept<-matrix_coef[1,1]

Biking<-matrix_coef[2,1]

Smoking<-matrix_coef[3,1]



#Printing the coefficients

Intercept

Biking

Smoking

Conclusion: This blog demonstrates how we can leverage R visualization widgets to build a prediction tool in SAC. The prediction tool shown in this blog solves a regression problem using a multiple linear regression algorithm.

If you like this blog, please like this blog post and follow me for more similar content related to SAP Analytics Cloud. If you have any questions or feedback, please leave a comment below.

Further study on similar topics:

https://blogs.sap.com/2022/09/12/automated-machine-learning-automl-using-analytic-application/

https://blogs.sap.com/2020/06/08/r-visualizations-in-sap-analytics-cloud/

https://www.scribbr.com/statistics/simple-linear-regression/

Pavan_Golesar · ‎10-15-2022

Thanks for post, Just update the sample data link to this:

Multiple Linear Regression | A Quick Guide (Examples) (scribbr.com)

Pavan Golesar

danishmeraj · ‎10-15-2022

Thank you very much for your feedback. I have updated the dataset URL 🙂

josephreddy2021 · ‎10-18-2022

Good One. If SAC can Integrate Python, will be great.

Henry_Banks · ‎10-18-2022

Hi josephreddy2021

there is a SAC roadmap item re: 'python like' expressions for data acquisition - H1 2023 next year

https://roadmaps.sap.com/board?PRODUCT=67838200100800006884&q=python&range=CURRENT-LAST#;INNO=6EAE8B...

please add your vote and comment to the influence item here https://influence.sap.com/sap/ino/#/idea/225367 which is the enhancement request you are looking for.

Currently, use of Phython scripts is happening further down the stack at SAP when using the Data Intelligence product . this is an ETL/ELT tool for predictive data pipelines and is considered a successor product to Data Services / Data Hub . Info here: https://blogs.sap.com/2021/02/03/sap-data-intelligence-custom-python-operator-for-beginners/

regards, H

josephreddy2021 · ‎10-18-2022

Thanks Henry. I am looking forward to use Python for prediction use cases also in SAC like Danish used R here.

danishmeraj · ‎10-19-2022

I agree with you.

Henry_Banks · ‎10-19-2022

Dear danishmeraj and josephreddy2021

i checked the enhancement requests listed in the SAC Influence portal and could not find any submission that exactly matches the requirement

would it be possible for you to collaborate on defining your exact requirement (something like "use Python script in Application Designer visualizations" ) and Submit Improvement here: Influence Opportunity Homepage - Customer Influence (sap.com)

Please do post back here on your community blog with the URL so that we can vote on it 🙂

Regards, H

danishmeraj · ‎10-19-2022

Dear Henry,

I think similar request is already available on SAC influence portal, but it has status: For long-term consideration. Please check out the URL here: Improvement Request Details - Customer Influence (sap.com).

Regards, Danish

ianbarrow · ‎09-21-2023

I can apply what you have done, so thank you.
Do you have any experience in using live connections when there is a complicated measure (ie split on various rows and including replacement path text variables).

Sales

Actual

Week 2023.38

Even though the measure is suggested after typing 3 chars, it does not appear to work unless I use a simple measure like "biking"

Even better, can we enter the technical name of the measure rather than the text. Not sure what will happen here with multilingual descriptions... I did try removing the 'xxx'

Ian

Creating a predictive analytic application using R in SAP Analytics Cloud

SAP PI for Beginners

ABAP 7.40 Quick Reference

Fiori: technical installation and configuration of one app from A - Z