Application Development Discussions
Join the discussions or start your own on all things application development, including tools and APIs, programming models, and keeping your skills sharp.
cancel
Showing results for 
Search instead for 
Did you mean: 

Community challenge alert! The SAP HANA Cloud Machine Learning Challenge is about to start!

susenpoppe
Product and Topic Expert
Product and Topic Expert

Prepare yourself (Prerequisites blogpost) and watch our kick-off call or read our kick-off blogpost.

During the challenge you will get the opportunity working hands-on with SAP HANA Cloud’s machine learning capabilities 🤖🎯.

Besides our open office hours, this thread is the place to ask questions and share your experiences with other machine learning enthusiasts. The challenge team is happy to answer any question you may have in the comments below.

Will you accept the challenge?

55 REPLIES 55

0 Kudos

 

[0].attr\, [0].pct\
[9].attr\, [9].pct\

 Please help to interpret results [0] compared to [9].

ChristophMorgen
Product and Topic Expert
Product and Topic Expert

The attribute is the feature name and the pct is the relative importance of a features value for a local classification decision. Feature/Attribute with index 0 has the highest relative importance, feature with index 9 is rank 10 with respect to relative importance/influence to the local classification decision.

Sergiu
Contributor
0 Kudos

plot_confusion_matrix works with model.confusion_matrix_

module hana_ml.visualizers.metrics
plot_confusion_matrix(self, df, normalize=False, **kwargs)

df : DataFrame
Data points to the resulting confusion matrix.
This dataframe's columns should match columns ('CLASS', '')

Please provide example for confusion_matrix resulted from predictions.

ChristophMorgen
Product and Topic Expert
Product and Topic Expert

for unified classification, you can simply use the score method and given in the example

# Test model generalization using the test data-subset, not used during training
scorepredictions, scorestats, scorecm, scoremetrics = hgbc.score(data=hdf_test , key= 'PRODUCT_ID', label='QUALITY',
                                                                 ntiles=20,
                                                                  thread_ratio=1.0)
display(scorestats.sort('CLASS_NAME').collect())
display(scorecm.filter('COUNT != 0').collect())

Sergiu
Contributor
0 Kudos
from hana_ml.algorithms.apl.gradient_boosting_classification import GradientBoostingBinaryClassifier

Please provide example for how to get model.confusion_matrix_ 

Thanks. Sergiu.

ChristophMorgen
Product and Topic Expert
Product and Topic Expert
0 Kudos

You can find an APL classification example here in the HANA-ML APL samples, e.g multinomial classifier

YannickSchaper
Product and Topic Expert
Product and Topic Expert

During the open office calls this week on Tuesday/Thursday (see this post for timings), we want to share some additional insights and outlook. First, we want to demonstrate, how you can leverage the same HANA ML capabilities directly in SAP Datawarehouse Cloud. Second, we want to highlight some time series capabilities with HANA ML. Of course, we will have time to answer your questions.

ChristophMorgen
Product and Topic Expert
Product and Topic Expert

Last week of our SAP HANA Cloud Machine Learning challenge hast started!

Tune and finalize your models, looking forward for all your results presentations 🙂 this Friday 16th Dec.

Presentation slots for your solution

·       Slot 1 (8AM CET): Asia Pacific & European time zone 

·       Slot 2 (4PM CET): European and American time zone

See you all this Friday!

Christoph

ChristophMorgen
Product and Topic Expert
Product and Topic Expert
0 Kudos

Question from today's open office hour: How can I run hyperparameter search with a classifier example (random forest example)?

Referring to the Product Quality PAL Tutorial example, here is sample code on how to apply the new and trending successive halving or hyperband parameter search options with Unified Classification and Hybrid Gradient Boosting Tree PAL algorithm:

# Train the ProductQuality classifer model using PAL HybridGradientBoostingTree
from hana_ml.algorithms.pal.unified_classification import UnifiedClassification

cv_range = {}
# cv_range['<parameter name>'] = [start, step, end]
cv_range['learning_rate'] = [0.01, 0.01, 0.9]
cv_range['n_estimators'] = [2, 1, 100]
cv_range['split_threshold'] = [0.1, 0.1, 10]
cv_range['max_depth'] = [1, 1, 50]

# Initialize the model object
hgbc = UnifiedClassification(func='HybridGradientBoostingTree',
split_method='histogram', max_bin_num=1000,

param_range=cv_range,

param_search_strategy='random',
random_search_times=5,
random_state=1234,
# stratified_cv_hyperband stratified_cv_sha
resampling_method='stratified_cv_hyperband', fold_num=5,
evaluation_metric = 'error_rate', ref_metric=['auc']
# ,validation_set_rate=0.3
# ,stratified_validation_set=True
# ,tolerant_iter_num=3
# ,timeout=60
# ,resource='data_size'
# ,max_resource=
# ,reduction_rate=
# ,min_resource_rate=
# ,aggressive_elimination=
, progress_indicator_id='<My_HGBT_HyperParameter_Search_Task>'
)


# Execute the training of the model
hgbc.fit(data=df_trainval,
key='PRODUCT_ID',
features=['SUPPLIER','MACHINE','SENSOR1','SENSOR2','SENSOR3','SENSOR4','SENSOR5','SENSOR6','SENSOR7','SENSOR8','SENSOR9','SENSOR10'],
label='QUALITY', categorical_variable= ['SUPPLIER', 'MACHINE', 'QUALITY'],
ntiles=20, build_report=True,
partition_method='user_defined', purpose='TRAIN_VAL_INDICATOR')

display(hgbc.runtime)
# Depending on parameter search range values, runtime might be significant

# Using a seperate connction, you can query the function progress table, to inspect progress of hyperparameter search
# use a query like: SELECT * FROM _SYS_AFL.FUNCTION_PROGRESS_IN_AFLPAL WHERE EXECUTION_ID = '<My_HGBT_HyperParameter_Search_Task>’;
# SELECT * FROM _SYS_AFL.FUNCTION_PROGRESS_IN_AFLPAL ORDER BY PROGRESS_TIMESTAMP DESC, PROGRESS_ELAPSEDTIME DESC, PROGRESS_CURRENT ASC;

display(hgbc.optimal_param_.collect())
#display(hgbc.statistics_.collect())
#display(hgbc.importance_.collect())
#display(hgbc.confusion_matrix_.collect())

 

0 Kudos

In order to apply hyper parameter search with the PAL Random Decision Trees (aka Forest Forest) you could utilize the PAL AutoML classifyer, see this AutoML Diabetes classifier example or the AutoML fraud detection classifier example from Yannick's AutoML introduction blog.

Sergiu
Contributor

Greetings!

I am new to the HANA ML challenge and have questions about presenting the results.
1. How do I book a time slot?
2. What do I need from a technical point of view to present, zoom or something else?
3. How much time is allocated to a slot?
4. To which email should I send (submit) the executed Jupyter Notebook with results before or after the presentation?

Thanks,
Sergiu

ChristophMorgen
Product and Topic Expert
Product and Topic Expert

Hi Sergiu, 

good question!

You can just join any of the presentation slots (we are using Zoom)

Slot 1 (8AM CET): Asia Pacific & European time zone 

Slot 2 (4PM CET): European and American time zone

You will have minimum 10 Minutes, going over your Python demo solution and any other format to present your results and findings is great.

You can email your results (after the presentation is fine) to SAPHANACloud@sap.com

One participant has already summarized the challenge findings and results in a blog, that's extra effort but of course an excellent way to present your findings not only to us but to a broader audience. Certainly, that's not required ...

Best, Christoph

susenpoppe
Product and Topic Expert
Product and Topic Expert

The SAP HANA Cloud Machine Learning Challenge is coming to an end. Join one of our presentation calls today to present your solution (no registration required).

Join us on

  • December 16, 2022, 8AM CET (4PM KST) | Join here
  • December 16, 2022, 4PM CET (10AM EST) | Join here

We are looking forward to seeing you 🤓

Sergiu
Contributor
0 Kudos

Good day!
At what time is the last slot for presentations?
Thanks.

susenpoppe
Product and Topic Expert
Product and Topic Expert

We will start at 4PM CET, so the last presentation slot for participants is planned to start at 4:45PM CET.