Re: Community challenge alert! The SAP HANA Cloud ... - Page 2

susenpoppe · ‎11-17-2022

Prepare yourself (Prerequisites blogpost) and watch our kick-off call or read our kick-off blogpost.

During the challenge you will get the opportunity working hands-on with SAP HANA Cloud’s machine learning capabilities 🤖🎯.

Besides our open office hours, this thread is the place to ask questions and share your experiences with other machine learning enthusiasts. The challenge team is happy to answer any question you may have in the comments below.

Will you accept the challenge?

Sergiu · ‎12-08-2022

[0].attr\, [0].pct\
[9].attr\, [9].pct\

Please help to interpret results [0] compared to [9].

ChristophMorgen · ‎12-08-2022

The attribute is the feature name and the pct is the relative importance of a features value for a local classification decision. Feature/Attribute with index 0 has the highest relative importance, feature with index 9 is rank 10 with respect to relative importance/influence to the local classification decision.

Sergiu · ‎12-10-2022

plot_confusion_matrix works with model.confusion_matrix_

module hana_ml.visualizers.metrics
plot_confusion_matrix(self, df, normalize=False, **kwargs)

df : DataFrame
     Data points to the resulting confusion matrix.
     This dataframe's columns should match columns ('CLASS', '')

Please provide example for confusion_matrix resulted from predictions.

ChristophMorgen · ‎12-12-2022

for unified classification, you can simply use the score method and given in the example

# Test model generalization using the test data-subset, not used during training
scorepredictions, scorestats, scorecm, scoremetrics = hgbc.score(data=hdf_test , key= 'PRODUCT_ID', label='QUALITY', 
                                                                 ntiles=20, 
                                                                  thread_ratio=1.0)
display(scorestats.sort('CLASS_NAME').collect())
display(scorecm.filter('COUNT != 0').collect())

Sergiu · ‎12-10-2022

from hana_ml.algorithms.apl.gradient_boosting_classification import GradientBoostingBinaryClassifier

Please provide example for how to get model.confusion_matrix_

Thanks. Sergiu.

ChristophMorgen · ‎12-12-2022

You can find an APL classification example here in the HANA-ML APL samples, e.g multinomial classifier

YannickSchaper · ‎12-12-2022

During the open office calls this week on Tuesday/Thursday (see this post for timings), we want to share some additional insights and outlook. First, we want to demonstrate, how you can leverage the same HANA ML capabilities directly in SAP Datawarehouse Cloud. Second, we want to highlight some time series capabilities with HANA ML. Of course, we will have time to answer your questions.

ChristophMorgen · ‎12-12-2022

Last week of our SAP HANA Cloud Machine Learning challenge hast started!

Tune and finalize your models, looking forward for all your results presentations 🙂 this Friday 16th Dec.

Presentation slots for your solution

· Slot 1 (8AM CET): Asia Pacific & European time zone

· Slot 2 (4PM CET): European and American time zone

See you all this Friday!

Christoph

ChristophMorgen · ‎12-13-2022

Question from today's open office hour: How can I run hyperparameter search with a classifier example (random forest example)?

ChristophMorgen · ‎12-13-2022

Referring to the Product Quality PAL Tutorial example, here is sample code on how to apply the new and trending successive halving or hyperband parameter search options with Unified Classification and Hybrid Gradient Boosting Tree PAL algorithm:

# Train the ProductQuality classifer model using PAL HybridGradientBoostingTree
from hana_ml.algorithms.pal.unified_classification import UnifiedClassification

cv_range = {}
# cv_range['<parameter name>'] = [start, step, end]
cv_range['learning_rate'] = [0.01, 0.01, 0.9]
cv_range['n_estimators'] = [2, 1, 100]
cv_range['split_threshold'] = [0.1, 0.1, 10]
cv_range['max_depth'] = [1, 1, 50]

# Initialize the model object 
hgbc = UnifiedClassification(func='HybridGradientBoostingTree',
split_method='histogram', max_bin_num=1000, 

param_range=cv_range,

param_search_strategy='random',
random_search_times=5, 
random_state=1234,
# stratified_cv_hyperband stratified_cv_sha
resampling_method='stratified_cv_hyperband', fold_num=5, 
evaluation_metric = 'error_rate', ref_metric=['auc']
# ,validation_set_rate=0.3
# ,stratified_validation_set=True
# ,tolerant_iter_num=3
# ,timeout=60 
# ,resource='data_size'
# ,max_resource=
# ,reduction_rate=
# ,min_resource_rate=
# ,aggressive_elimination=
, progress_indicator_id='<My_HGBT_HyperParameter_Search_Task>'
)


# Execute the training of the model
hgbc.fit(data=df_trainval, 
key='PRODUCT_ID', 
features=['SUPPLIER','MACHINE','SENSOR1','SENSOR2','SENSOR3','SENSOR4','SENSOR5','SENSOR6','SENSOR7','SENSOR8','SENSOR9','SENSOR10'],
label='QUALITY', categorical_variable= ['SUPPLIER', 'MACHINE', 'QUALITY'],
ntiles=20, build_report=True,
partition_method='user_defined', purpose='TRAIN_VAL_INDICATOR')

display(hgbc.runtime)
# Depending on parameter search range values, runtime might be significant

# Using a seperate connction, you can query the function progress table, to inspect progress of hyperparameter search
# use a query like: SELECT * FROM _SYS_AFL.FUNCTION_PROGRESS_IN_AFLPAL WHERE EXECUTION_ID = '<My_HGBT_HyperParameter_Search_Task>’;
# SELECT * FROM _SYS_AFL.FUNCTION_PROGRESS_IN_AFLPAL ORDER BY PROGRESS_TIMESTAMP DESC, PROGRESS_ELAPSEDTIME DESC, PROGRESS_CURRENT ASC;

display(hgbc.optimal_param_.collect())
#display(hgbc.statistics_.collect())
#display(hgbc.importance_.collect())
#display(hgbc.confusion_matrix_.collect())

ChristophMorgen · ‎12-13-2022

In order to apply hyper parameter search with the PAL Random Decision Trees (aka Forest Forest) you could utilize the PAL AutoML classifyer, see this AutoML Diabetes classifier example or the AutoML fraud detection classifier example from Yannick's AutoML introduction blog.

Sergiu · ‎12-15-2022

Greetings!

I am new to the HANA ML challenge and have questions about presenting the results.
1. How do I book a time slot?
2. What do I need from a technical point of view to present, zoom or something else?
3. How much time is allocated to a slot?
4. To which email should I send (submit) the executed Jupyter Notebook with results before or after the presentation?

Thanks,
Sergiu

ChristophMorgen · ‎12-15-2022

Hi Sergiu,

good question!

You can just join any of the presentation slots (we are using Zoom)

Slot 1 (8AM CET): Asia Pacific & European time zone

Slot 2 (4PM CET): European and American time zone

You will have minimum 10 Minutes, going over your Python demo solution and any other format to present your results and findings is great.

You can email your results (after the presentation is fine) to SAPHANACloud@sap.com

One participant has already summarized the challenge findings and results in a blog, that's extra effort but of course an excellent way to present your findings not only to us but to a broader audience. Certainly, that's not required ...

Best, Christoph

susenpoppe · ‎12-16-2022

The SAP HANA Cloud Machine Learning Challenge is coming to an end. Join one of our presentation calls today to present your solution (no registration required).

Join us on

December 16, 2022, 8AM CET (4PM KST) | Join here
December 16, 2022, 4PM CET (10AM EST) | Join here

We are looking forward to seeing you 🤓

Sergiu · ‎12-16-2022

Good day!
At what time is the last slot for presentations?
Thanks.

susenpoppe · ‎12-16-2022

We will start at 4PM CET, so the last presentation slot for participants is planned to start at 4:45PM CET.

Community challenge alert! The SAP HANA Cloud Machine Learning Challenge is about to start!