11-17-2022 7:57 AM - edited 11-30-2022 7:53 AM
Prepare yourself (Prerequisites blogpost) and watch our kick-off call or read our kick-off blogpost.
During the challenge you will get the opportunity working hands-on with SAP HANA Cloud’s machine learning capabilities 🤖🎯.
Besides our open office hours, this thread is the place to ask questions and share your experiences with other machine learning enthusiasts. The challenge team is happy to answer any question you may have in the comments below.
Will you accept the challenge?
12-08-2022 10:57 AM
[0].attr\, [0].pct\
[9].attr\, [9].pct\
Please help to interpret results [0] compared to [9].
12-08-2022 11:32 AM
The attribute is the feature name and the pct is the relative importance of a features value for a local classification decision. Feature/Attribute with index 0 has the highest relative importance, feature with index 9 is rank 10 with respect to relative importance/influence to the local classification decision.
12-10-2022 11:53 AM
plot_confusion_matrix works with model.confusion_matrix_
module hana_ml.visualizers.metrics
plot_confusion_matrix(self, df, normalize=False, **kwargs)
df : DataFrame
Data points to the resulting confusion matrix.
This dataframe's columns should match columns ('CLASS', '')
Please provide example for confusion_matrix resulted from predictions.
12-12-2022 8:39 AM
for unified classification, you can simply use the score method and given in the example
# Test model generalization using the test data-subset, not used during training
scorepredictions, scorestats, scorecm, scoremetrics = hgbc.score(data=hdf_test , key= 'PRODUCT_ID', label='QUALITY',
ntiles=20,
thread_ratio=1.0)
display(scorestats.sort('CLASS_NAME').collect())
display(scorecm.filter('COUNT != 0').collect())
12-10-2022 1:16 PM
from hana_ml.algorithms.apl.gradient_boosting_classification import GradientBoostingBinaryClassifier
Please provide example for how to get model.confusion_matrix_
Thanks. Sergiu.
12-12-2022 8:43 AM
You can find an APL classification example here in the HANA-ML APL samples, e.g multinomial classifier
12-12-2022 1:32 PM
During the open office calls this week on Tuesday/Thursday (see this post for timings), we want to share some additional insights and outlook. First, we want to demonstrate, how you can leverage the same HANA ML capabilities directly in SAP Datawarehouse Cloud. Second, we want to highlight some time series capabilities with HANA ML. Of course, we will have time to answer your questions.
12-12-2022 2:16 PM
Last week of our SAP HANA Cloud Machine Learning challenge hast started!
Tune and finalize your models, looking forward for all your results presentations 🙂 this Friday 16th Dec.
Presentation slots for your solution
· Slot 1 (8AM CET): Asia Pacific & European time zone
· Slot 2 (4PM CET): European and American time zone
See you all this Friday!
Christoph
12-13-2022 4:06 PM
Question from today's open office hour: How can I run hyperparameter search with a classifier example (random forest example)?
12-13-2022 4:12 PM
Referring to the Product Quality PAL Tutorial example, here is sample code on how to apply the new and trending successive halving or hyperband parameter search options with Unified Classification and Hybrid Gradient Boosting Tree PAL algorithm:
# Train the ProductQuality classifer model using PAL HybridGradientBoostingTree
from hana_ml.algorithms.pal.unified_classification import UnifiedClassification
cv_range = {}
# cv_range['<parameter name>'] = [start, step, end]
cv_range['learning_rate'] = [0.01, 0.01, 0.9]
cv_range['n_estimators'] = [2, 1, 100]
cv_range['split_threshold'] = [0.1, 0.1, 10]
cv_range['max_depth'] = [1, 1, 50]
# Initialize the model object
hgbc = UnifiedClassification(func='HybridGradientBoostingTree',
split_method='histogram', max_bin_num=1000,
param_range=cv_range,
param_search_strategy='random',
random_search_times=5,
random_state=1234,
# stratified_cv_hyperband stratified_cv_sha
resampling_method='stratified_cv_hyperband', fold_num=5,
evaluation_metric = 'error_rate', ref_metric=['auc']
# ,validation_set_rate=0.3
# ,stratified_validation_set=True
# ,tolerant_iter_num=3
# ,timeout=60
# ,resource='data_size'
# ,max_resource=
# ,reduction_rate=
# ,min_resource_rate=
# ,aggressive_elimination=
, progress_indicator_id='<My_HGBT_HyperParameter_Search_Task>'
)
# Execute the training of the model
hgbc.fit(data=df_trainval,
key='PRODUCT_ID',
features=['SUPPLIER','MACHINE','SENSOR1','SENSOR2','SENSOR3','SENSOR4','SENSOR5','SENSOR6','SENSOR7','SENSOR8','SENSOR9','SENSOR10'],
label='QUALITY', categorical_variable= ['SUPPLIER', 'MACHINE', 'QUALITY'],
ntiles=20, build_report=True,
partition_method='user_defined', purpose='TRAIN_VAL_INDICATOR')
display(hgbc.runtime)
# Depending on parameter search range values, runtime might be significant
# Using a seperate connction, you can query the function progress table, to inspect progress of hyperparameter search
# use a query like: SELECT * FROM _SYS_AFL.FUNCTION_PROGRESS_IN_AFLPAL WHERE EXECUTION_ID = '<My_HGBT_HyperParameter_Search_Task>’;
# SELECT * FROM _SYS_AFL.FUNCTION_PROGRESS_IN_AFLPAL ORDER BY PROGRESS_TIMESTAMP DESC, PROGRESS_ELAPSEDTIME DESC, PROGRESS_CURRENT ASC;
display(hgbc.optimal_param_.collect())
#display(hgbc.statistics_.collect())
#display(hgbc.importance_.collect())
#display(hgbc.confusion_matrix_.collect())
12-13-2022 4:20 PM
In order to apply hyper parameter search with the PAL Random Decision Trees (aka Forest Forest) you could utilize the PAL AutoML classifyer, see this AutoML Diabetes classifier example or the AutoML fraud detection classifier example from Yannick's AutoML introduction blog.
12-15-2022 8:13 AM - edited 12-15-2022 8:14 AM
Greetings!
I am new to the HANA ML challenge and have questions about presenting the results.
1. How do I book a time slot?
2. What do I need from a technical point of view to present, zoom or something else?
3. How much time is allocated to a slot?
4. To which email should I send (submit) the executed Jupyter Notebook with results before or after the presentation?
Thanks,
Sergiu
12-15-2022 8:33 AM - edited 12-15-2022 8:38 AM
Hi Sergiu,
good question!
You can just join any of the presentation slots (we are using Zoom)
Slot 1 (8AM CET): Asia Pacific & European time zone
Slot 2 (4PM CET): European and American time zone
You will have minimum 10 Minutes, going over your Python demo solution and any other format to present your results and findings is great.
You can email your results (after the presentation is fine) to SAPHANACloud@sap.com
One participant has already summarized the challenge findings and results in a blog, that's extra effort but of course an excellent way to present your findings not only to us but to a broader audience. Certainly, that's not required ...
Best, Christoph
12-16-2022 6:32 AM
The SAP HANA Cloud Machine Learning Challenge is coming to an end. Join one of our presentation calls today to present your solution (no registration required).
Join us on
We are looking forward to seeing you 🤓
12-16-2022 9:06 AM - edited 12-16-2022 9:07 AM
12-16-2022 9:10 AM
We will start at 4PM CET, so the last presentation slot for participants is planned to start at 4:45PM CET.