Artificial Intelligence and Machine Learning Blogs
Explore AI and ML blogs. Discover use cases, advancements, and the transformative potential of AI for businesses. Stay informed of trends and applications.
cancel
Showing results for 
Search instead for 
Did you mean: 
ChristophMorgen
Product and Topic Expert
Product and Topic Expert

With the 2024 Q1 database release, several new features have been released the SAP HANA Cloud Predictive Analysis Library (PAL), an enhancement summary is available in the What’s new document for SAP HANA Cloud database 2024.02 (QRC 1/2024).

The feature highlights for the current release are described in more detail below

Classification and Regression enhancements

Unified Regression along with Unified Classification and Time Series now supports permutation feature importance, a new and trending method in global explain-ability to evaluate the contribution of individual features to the overall predictive power of a model. This is achieved by measuring the decrease of a model’s performance when a feature‘s values are being shuffled around. A detailed explanation and examples are also given in this blog Global Explanation Capabilities in SAP HANA Machine Learning.

ChristophMorgen_0-1712926457926.pngChristophMorgen_1-1712926457929.png

Classic feature importance vs permutation feature importance reports (see blog for details)

The Hybrid Gradient Boosting Tree (HGBT) now supports  F1-scores, recall and precision as cross validation metric for improved, more targeted classification models. Furthermore, weight scaling of target values in classification is now supported to address imbalanced classes or weight scale target values in relation for example to different costs associated to the different class values.
A new and trending regression model objective function “reweighted square” has been introduced, aiding to achieve more robust and regularized regression models.
For improved early stopping during model optimization, the validation metric for early stopping can now be explicitly set.

The recently introduced multi-layer perceptron MLP recommender function, now supports multiclass classification and regression recommender scenarios. This allows to reformulate the recommendation task as a classification or regression problem. The implementation employs a dual-stream framework where two sets of features representing  for example user – and items features, respectively, are fed into a feature selection module. The outputs are streamed into MLP-neural networks and combined in a bilinear aggregation layer. This new and trending neural network framework can handle large-scale data volumes in recommendation scenarios very effectively.

The K-Nearest Neighbor (KNN) classification and regression functions has been enhanced with a new similarity search method, in addition to brute force and KD-tree searching a matrix enabled search-method has been introduced, allowing for much faster similarity search results especially with high-dimensional numeric feature data.

Auto-ML and ML pipeline function improvements

The Auto-ML functions for the Predictive Analysis Library (PAL) have been enhanced with

  • a new option to trigger deeper finetuning of the best pipeline found
  • the genetic algorithm-based Auto-ML optimization has been enriched with a RANDOM SEARCH-based optimization, suited especially for smaller configurations (e.g. simple time series) and yielding with faster results
  • new method to clear and initialize the Auto-ML log
  • Auto-ML and pipeline model explain-ability enhancement with a SHAP Global surrogate light-weight model for faster global explanation model calculation and faster local prediction interpretability results

Text Processing

  • The Text Mining related document and term analysis function do now support massive parallel invocation, allowing for multiple input text to be analyzed in parallel.
    ChristophMorgen_2-1712926848134.png

    Multiple documents (here IDs 0 and 5) are searched in parallel for related documents

New financial data analysis functions

The newly implemented single-factor Hull-White procedure , can be used to model the time evolution of interest rates, which are required for price estimation of financial instruments based on interest rate derivatives.

To apply the Hull-White model it first needs to be adopted to match existing market conditions (interest rates). This is achieved  by providing the values of the drift term of the Hull-White model as a time series as  input table. The simulation will then provide the mean value for a given number of simulation paths (also specified as an input parameter), their variance, as well as the upper and lower bounds.

ChristophMorgen_3-1712926961869.png

 

The chart above depicts the initial dataset used to calibrate the mode, mean and confidence interval of the Hull-White simulation.

New Benford’s Law function in PAL, a trending algorithm used to detect anomalies in numerical datasets like e.g. financial transactions.

One of the (not so) well-known statistical observations is the fact that in many datasets the leading significant digits are not equally distributed. If all digits were represented equally, then they would appear 11.1 percent (1/9TH) of the time. However, when analyzing real-world datasets, e.g. the population totals of the US census data, it is revealed that the distribution of the leading digits follows the Bedford’s law, also known as the first-digit law.

  • P(d) = log10 (1+ 1/d), where P(d) is the probability of the leading digit {1,2,....9} to occur.

ChristophMorgen_4-1712926961871.png

With the help of PAL’s new BENFORD analysis function it is now very easy to validate if a dataset obeys Bedford’s law or not. A first step means very commonly used in financial applications to detect unexpected value distribution and e.g. potential fraudulent transaction data.

Python ML client (Hana-ML) enhancements

The full list of new methods and enhancements with Hana-ML 2.20 is summarized in the changelog for Hana-ml 2.20.240319 as part of the documentation. The key enhancements in this release include

Time series analysis and forecasting methods

  • Time series permutation feature importance analysis
  • Time series outlier detection with voting
  • Segmented (massive) online Bayesian Change Point Detection

Auto-ML configuration and methods enhancements

  • Updated Auto-ML configuration dictionary-templates with new operators and random search optimization support for e.g. small time series configurations
  • Enhanced Auto-ML configuration option for setting connection constraints during optimization of multi-operator pipelines and visualization of pipeline connection scores between operators
    ChristophMorgen_0-1712931648498.png
  • Support algorithm-specific parameters with Auto-ML predict-calls, relevant for both pipeline predict and Auto-ML methods.
  • Enhanced progress monitor for Auto-ML to display at anytime and log management methods, allowing to set log levels, persist progress logs clean up logs and more.

Exploratory data analysis and visualization enhancements

  • New Bubble Plot and Parallel Co-ordinate Plot

    ChristophMorgen_6-1712927665343.png

    ChristophMorgen_7-1712927665362.png

     

You can find an examples notebook illustrating the highlighted feature enhancements here 24QRC01_2.20.ipynb.

SAP HANA Cloud, SAP HANA database Python Machine Learning 

4 Comments