Introduction

david_serre · ‎03-23-2021

Introduction

By default, Predictive Planning proposes to evaluate the future performance of your predictive models using the HW-Wide MAPE (Mean Absolute Percentage Error). While this performance measure has multiple advantages (interpretable, non-unit dependent…), you may want to evaluate your time-series models using specific performance measures that make sense for your specific use case.

In this blog post we will see how you can compare the performance of time-series models in stories using custom performance measures.

Scenario

Let us assume you want to forecast the future visits in the US National Parks (you can use this earlier Predictive For Planning article as an introduction to this topic). You are mostly interested having the best forecasts of the total visits across all the parks. But there are several different predictive models you could create in Smart Predict that would help achieving this goal:

using a top-down approach, you could create a single model predicting the aggregated total visits.

using a bottom-up approach you could create a predictive model predicting visits in each park individually.

The best way to know which of these two models is likely to provide the best predictions is to compare the forecasting error the two models have made. Let’s assume that you want to compare your two predictive models using the Meam Absolute Error (MAE) metric. This performance indicator is not provided by the default Predictive Planning reports, so you would need to use a story to calculate and display it.

In this blog post we will show you how to calculate the MAE and display in a predictive model comparison table like the one below.

Predictive Model Performance Comparison Table

Obviously, you can easily generalize this explanation to any other standard or ad hoc performance comparison metric.

You can find the data file here if you want to recreate this example.

You can find an explanation about how to create the corresponding planning model here.

Prepare the Planning Model

Start by creating a Story referencing you planning model.

Open the and create one “blank” version for each predictive model you want to compare. For our scenario, we need two versions, one for the top-down forecasting model (“global”) and one for the bottom-up model segmented by park.

Planning Versions

Your planning model is now ready to receive some predictive forecasts.

Create the Time-Series Forecasting Models

Create a Time-Series predictive scenario. In this Predictive Scenario create two models using the settings presented below and train them.

Only the “Entity” parameter is different for the two models.

Global model (Entity: None😞

One model per park (Entity: ParkName😞

Please refer to this article if you need help using Predictive Scenarios.

Write the Predictions

When you write the predictions to the planning model using the “Save Predictive Forecast” option, by default, Smart Predict writes only the forecasts for the future period (red frame below) to the output version. This is all you need when your goal is only to get future predictions.

To calculate a MAE (or any model performance indicator) to evaluate the potential future performance of a predictive model, we need to be able to compare the forecasts to real values (“actuals”). The actuals are obviously known only for the past period. That means that we need forecasts “in the past” (the training data partition) where the actuals are known.

Past Forecasts Vs Future Forecasts

Writing these past forecasts is the purpose of the new “Save Forecasts For Past Period” option.

The two tables below, compare the output you get when the option is not enabled and when it’s enabled.

"Save Forecasts For Past Period" Not Enabled (default)

"Save Forecasts For Past Period" Enabled

Select the “global” model, then click the “Save Forecasts” button (button with a factory like icon).

In the “Save Forecasts” dialog, select the “Global” version you have created previously as private output version.

Expand the “Advanced Settings” section and enable the “Save Forecasts For Past Period” option.

Finally, click save to save the predictions to the “global” version.

Now do the same for the “by park” model.

Select the “by park” model, then click the “Save Forecasts” button (button with a factory like icon).

In the “Save Forecasts” dialog, select the “by park” version you have created previously as private output version.

Expand the “Advanced Settings” section and enable the “Save Forecasts For Past Period” option.

Finally, click save to save the predictions to the “by park” version.

Prepare the Story

We want to compare the total forecasts as provided by the “global” model to the total forecasts as provided by the “by park” model using the Mean Absolute Error metric (MAE).

So, in term of story calculation, we need to:

Calculate the absolute difference between the predicted RecreationVisits and the actual RecreationVisits.

Get the average of this value over the “past period”.

Add a Temporal Filter

Start by creating a table in the story. This table will be used to display the comparison of the predictive model performances.

The actuals for the future are obviously unknown, so it’s not possible to compute the error (actual – prediction) for the dates in this period. Since we want to compute some aggregated performance indicator, it’s important to exclude this period when calculating the error and the average of the error.

For this blog post we will compute the MAE for the 1-year period from January 2019 to December 2019, immediately preceding the forecast period.

Create a table in the story.

In the left settings panel click “Add Filter” then “Date (Member)”.

Select the 2019 value only.

Calculating the Absolute Difference Between the Predictions and the Actuals

The first problem we must solve, is how to calculate the difference Actual.RecreationVisits – Predicted.RecreationVisits. This is not something that can be calculated directly using the formula editor. The trick is to use a “Restricted Measure” in order to “isolate” to duplicate the values for Actual.RecreationVisits into another measure.

Create a new calculation.

Select the type “Restricted Measure” for your calculation.

Name it “reference RecreationVisits”.

Select “RecreationVisits” as measure to be copied.

Copy only the values for “Category = Actual”.

Select the “Enable Constant Selection” checkbox, otherwise the previous settings will be ignored.

Now let’s calculate the absolute error.

Create a new calculation

Select the type “Calculated Measure” for your calculation.

Name it “absolute error”.

Enter the formula ABS([#reference RecreationVisits]-["national parks frequentation enriched":RecreationVisits] )

To get a better understanding of the overall forecast error, we will also represent the total error as a percentage. To do so, we just need to divide the total absolute error by the actual value.

At this stage with the proper table configuration, you should get something like this:

Calculating the Error Average Per Date

Create a new calculation

Choose the “Aggregation” type for the calculation

Call the calculation MAE

Compute the AVERAGE excl. NULL of the absolute error measure/

Select the Date as aggregation dimension.

You now have all the calculated measures you need to build the table below.

If you goal is only to predict accurately the total visits and you are not interested in consuming the forecast at the park level, then the “global” predictive model is the one that should be used. If you care about having accurate total forecast but also need drill at the park level, then the “by park” model the right model to use.

Conclusion

Using the same logic based on calculated measures you could as well compare the predictive models based on the relative error, the RMSE (Root Mean Square Error) or any custom performance measure make sense for you.

I hope this blog post was helpful to you. If you appreciated reading this, I’d be grateful if you left a comment to that effect, and don’t forget to like it as well. Thank you.

Do you want to learn more on Predictive Planning?

You can read our other blog posts.

You can also explore our learning track.

You can also go hands-on and experience SAP Analytics Cloud by yourself.

Find all Q&A about SAP Analytics Cloud and feel free to ask your own question here: https://answers.sap.com/tags/67838200100800006884

Visit your community topic page to learn more about SAP Analytics Cloud: https://community.sap.com/topics/cloud-analytics

achab · ‎03-25-2021

Excellent blog David - this is the perfect hands-on companion to the feature we freshly released in wave 2021.06 https://saphanajourney.com/sap-analytics-cloud/product-updates/release-2021-06/.

This is already available to our partners using test & demo tenants, is planned be available to our customers on "fast track" systems over the week-end and is planned to be part of our next quarterly release (May release).

souleymane · ‎06-26-2021

Good Job David.

Just one question. What to do if the MAPE indicator is too high to find a good performance of the model?

david_serre · ‎06-28-2021

Hello Souleymane,

First be sure that you are using the performance that best fits your requirements and your use case. In our example using the MAPE to evaluate the prediction accuracy may not be the best choice. The MAPE tends to exaggerate the error when the target is close to zero and there are a few entities in our example (such as Glacier Bay) where the target is close to 0 pretty often. MAE is a better metric to evaluate our visit predictions.

Once you have chosen the best suited evaluation metric, if the metric is still too high then you need to improve the model quality. There are a few ways to achieve that

You may try to use influencers to improve the model accuracy. The influencers are a new feature that we just introduced in Predictive Planning: https://blogs.sap.com/?p=1356153?source=email-global-notification-bp-new-in-tag-followed
You can consider tweaking the size of the training period ("Train Using" parameter). By default, Predictive Planning uses all the available historical data points. This is often the best thing to do. But sometimes it's a better idea to reduce the size of the historical data used to build the model. In the example used in this blog post, we have reduced the training period to the last 5 years. Why? The planning model contains 10 years of historical data. Over 10 years a lot of things have changed and the patterns that existed10 years ago (trends, cycles...) don't exist anymore. If you train the model using the 10 years of historical data the model will learn the old patterns and try to apply them to the future. So, how do you decide if you should keep 3, 5 or 10 years of historical data? Basically, you have to experiment and check what size provides the best results. For this blog post I have experimented with different values. But there are rules of thumb that you can apply. First, to detect a cycle, he Predictive Planning engine needs to see it at least 3 times. That means that in our example we need at least a 3 years windows size to the yearly cycles are properly detected.
The data granularity is also a very important factor that you have to experiment with. Let's consider a "sales" forecasting example, were you have the product line, product, shop and country dimensions. It is tempting to generate the forecast at a very detailed granularity, by "product and shop". But it's likely that at search a detailed granularity data is sparse (maybe some products don't sell well is some shop), or simply some patterns are not very strong. Maybe, stronger patterns can be detected if the forecast are generated by "product and country", "product line and shop" or just by "product". Basically, you have to try to see what level of granularity provides the best results.
Sometimes, most of the entities are predicted with the expected level of accuracy but a few of them cannot. In such case, you should consider filtering out the few "bad" entities and treat them separately. In our national park example, most of the parks are forecasted very accurately but the Virgin Island NP park is not. The Virgin Islands have been impacted by two major hurricanes in 2017 and have not fully recovered since: tourism was severely disrupted and no new pattern has emerged. Providing accurate predictions for this entity is pretty much impossible.

Hope this was helpful.

souleymane · ‎06-29-2021

Hello David,

Thanks for your feedback.
Just clarify that someone who uses SAP Predictive Factory does not have the possibility to use MAE indicator, All the more if he works with large data sizes.

I find the idea of reducing the sample size lose or use a key influences.

Kind Regars.

Building a Time-Series Forecasting Performance Evaluation Metric In SAP Analytics Cloud

Introduction

Scenario

Prepare the Planning Model

Create the Time-Series Forecasting Models

Write the Predictions

Prepare the Story

Add a Temporal Filter

Calculating the Absolute Difference Between the Predictions and the Actuals

Calculating the Error Average Per Date

Conclusion

Get Your SAP HANA Idea Incubator Badge Today!

SCN Mission - SAP HANA Quiz Challenge is now retired

Share your #HANAStory and Win