cancel
Showing results for 
Search instead for 
Did you mean: 

Community challenge alert! The SAP HANA Cloud Machine Learning Challenge is about to start!

susenpoppe
Employee
Employee

Prepare yourself (Prerequisites blogpost) and watch our kick-off call or read our kick-off blogpost.

During the challenge you will get the opportunity working hands-on with SAP HANA Cloud’s machine learning capabilities 🤖🎯.

Besides our open office hours, this thread is the place to ask questions and share your experiences with other machine learning enthusiasts. The challenge team is happy to answer any question you may have in the comments below.

Will you accept the challenge?

27 REPLIES 27

ChristophMorgen
Employee
Employee

Hi,

You’ve signed up for ML Community challenge "I quit! - How to prevent employee churn” kick-off on November 28th, that’s fantastic!

Review this blog for preparation details: Prerequisites for the SAP HANA Cloud Machine Learning Challenge “I quit!” – How to prevent employee .... As you will need an SAP HANA Cloud instance (Free tier or trial are not compatible with the challenge contents), and if you don’t have an adequate set-up yourselves, please reach out to the team SAPHANACloud@sap.com to get access to a suitable instance.

Looking forward to see you joining our challenge!

Best regards,

Christoph

ChristophMorgen
Employee
Employee

Join our open office hour calls, starting today

During the challenge you can join our experts to get your open questions answered and address any issues you are dealing with regarding the challenge. Please find the slots offered below:

Or simply ask you questions here in the forum!

WenjingLiu
Galactic 2
Galactic 2

Hi,
I would need to check the data uploaded onto the HANA Cloud database. How can I access SAP HANA Cloud Cockpit or SQL console (SAP HANA Data Explorer) for the following instance? Thanks!

8c7eed2e-f460-4a82-abfa-e3ede36923d7.hna1.prod-eu10.hanacloud.ondemand.com

Hi, we haven't set up full BTP applications like HANA Cloud Database Explorer available on our side for the participants. However, if you have a local HANA Cockpit / Database Explorer, you can simply establish a SQL Connection to the HANA Cloud instance from there. Alternatively, you can use a tools like DBeaver locally installed and connect (see blog). Certainly from the python side, once the data is uploaded, and you have the dataframe created, you can use dataframe methods to access and check the data, e.g. mydataframe.head(10).collect() would issue a Select top 10 * from <table>. you can see some method examples in Yannick's blog at "How can we explore our data and react to data quality issues early".

Christoph

If you have a BTP account, e.g. in BTP Trial or a Free Tier, then could you try if generic DBX opens for you at https://hana-cockpit.cfapps.us10.hana.ondemand.com/hrtt/sap/hana/cst/catalog/index.html?

Alternatively, you can use DBeaver -- an open-source IDE for databases: https://blogs.sap.com/2020/04/20/secure-connection-from-dbeaver-to-sap-hana-cloud/

Regards,
-Witalij

ChristophMorgen
Employee
Employee

As another helper to get everyone started, find another employee churn data upload and exploration using HANA dataframes notebook in our challenge repository

Sergiu
Galactic 3
Galactic 3

Hi.
In pandas dataframe data can be modified with inplace = True or by assigning results to the same dataframe, example df_data = df_data.replace(to_replace = np.nan, value = 'MSV').
In hana dataframe COLLECT allows to show data, but the inplace = True is no supported and assigning  modified data back to same dataframe is not allowed, example data_hdf = data_hdf.fillna('MSV').collect() is not supported.
Does that mean that data has to be modified in pandas and updated to hana wtih dataframe.create_dataframe_from_pandas?
COLLECT shows data. What are the methods that allow to modify hana dataframe? Please recommend notebooks with examples on how to modify hana dataframe.
Thanks. Sergiu.

HANA dataframe does not store data, as it is with the Pandas or R dataframe. I had it explained for example here: https://blogs.sap.com/2020/11/10/multi-model-in-hana_ml-2.6-for-python-dataframes/

HANA dataframe in Python represents a SELECT statement to be executed in SAP HANA database and it is executed only once a method like `collect()` is called.

There is a method `save()` that creates a table or a db view for data returned by a HANA dataframe to the database :

https://help.sap.com/doc/1d0ebfe5e8dd44d09606814d83308d4b/2.0.06/en-US/hana_ml.dataframe.html#hana_m...

Pandas dataframe:
df = pd.DataFrame({"A": [1,2,3], "B": [2,3,4]})
df['C] = 1
df['C'] = df['C] + 2

Sergiu_0-1669906283168.png

hana-ml Tutorial - Dataframe:
d_train_new = d_train.add_constant('C',1) 

I couldn't find in tutorial how to modify a column, tried as in pandas but I got an error.
d_train_new['C'] = d_train_new['C'] + 1
TypeError: __getitem__ argument not understood: 'C'

Please help with examples on how to modify hana dataframe column. 

you can simply use the select dataframe method to create new columns

df.select('*', ('B * 4', 'D')).collect()

Sergiu
Galactic 3
Galactic 3

In presented notebooks in hana model.fit the parameter key is mandatory. 
In Python machine libraries key parameter is not necessary.
What is the role of the key hana libraries?
MODEL_SEARCH.fit(data=d_train, key= ID,
label=target,
partition_method='stratified',
partition_random_state=1,
stratified_column=target,
build_report=False)

Thanks. Sergiu.

Hi Sergiu, key is actually not mandatory with fitting a model. However if the input dataframe to a fit-call contains and Id column, you either have to drop it from the dataframe beforehand, or identify the column as key / id column so it's not treated as a feature column. With the predict / score methods e,g. of unified classification, an id-column is required. 

Hope this explains.

Christoph

Sergiu
Galactic 3
Galactic 3

Estimate car price - Load data to SAP HANA 

Pandas:
# df - pandas dataframe
df_data
.insert(0, 'CAR_ID', df_data.reset_index().index)
OR
df_data[COL_CAR_ID] = df_data.index

Please help with examples on how to copy in hana_ml.dataframe index to column.

Thanks. Sergiu.

# df - hana_ml.dataframe
df
.add_id(id_col='ID')

Great you found the solution in between!!!  😊

YannickSchaper
Employee
Employee

Today we share additional data, which was discovered from our data engineer. The additional data for example includes if an employee recently updated their LinkedIn Profile or took part in an HR training. During the open office calls this week on Tuesday/Thursday (see this post for timings), we will also demo how to use Unified Classification with PAL Hybrid Gradient Boosting (see example) and the new PAL AutoML-Classification feature (see example, blog, community call introduction, more automl examples), as two additional approaches to tackle the employ churn challenge and achieve best results …

Have fun working on the challenge!

Join us in 5 minutes for a short demo and to address your open questions in our open office hour.

Sergiu_0-1670313891841.png

 

Sergiu
Galactic 3
Galactic 3

hana_ml.dataframe - replace allows to replace values in the same column from a dictionary. 

Examples
>>> df.collect()
Aa Bb Cb
0 0 5 a
1 10 0 b
2 2 7 c
3 3 8 d
4 4 9 e

>>> df.replace(to_replace={'Aa': {0: 100, 4: 400}}).collect()
Aa Bb Cb
0 100 5 a
1 10 0 b
2 2 7 c
3 3 8 d
4 400 9 e
Pandas
>>> df = pd.DataFrame([10,20,30,40,50], columns =['speed'])
>>> df.loc[(df["speed"] <= 20), "gear"] = 'd1'
>>> df.loc[(df["speed"] > 20) & (df["speed"] <= 40), "gear"] = 'd2'
>>> df.loc[(df["speed"] > 40), "gear"] = 'd3'
>>> df

  speed gear
0 10 d1
1 20 d1
2 30 d2
3 40 d2
4 50 d3

Please help with examples for ml_hana.dataframe on how to create an additional column with values based on conditions of other columns as in pandas.

Thanks. Sergiu.

 

Sergiu
Galactic 3
Galactic 3

hana_ml.dataframe.DataFrame - replace allows to replace values from same column from a dictionary.

Examples
--------
>>> df.collect()
    Aa  Bb  Cb
    0   0   5   a
    1   10  0   b
    2   2   7   c
    3   3   8   d
    4   4   9   e
>>> df.replace(to_replace={'Aa': {0: 100, 4: 400}}).collect()
    Aa  Bb  Cb
    0   100 5   a
    1   10  0   b
    2   2   7   c
    3   3   8   d
    4   400 9   e
Pandas
>>>df = pd.DataFrame([10,20,30,40,50], columns =['speed'])
>>>df.loc[(df["speed"] <= 20), "gear"] = 'd1'
>>>df.loc[(df["speed"] > 20) & (df["speed"] <= 40), "gear"] = 'd2'
>>>df.loc[(df["speed"] > 40), "gear"] = 'd3'
>>>df
speed gear
0 10 d1
1 20 d1
2 30 d2
3 40 d2
4 50 d3

 Please provide examples for hana_ml.dataframe.DataFrame on how to create conditional column based on values of another column.

Thanks. Sergiu.

df = pd.DataFrame([10,20,30,40,50], columns =['SPEED'])
df.loc[(df["SPEED"] <= 20), "GEAR"] = 'd1'
df.loc[(df["SPEED"] > 20) & (df["SPEED"] <= 40), "GEAR"] = 'd2'
df.loc[(df["SPEED"] > 40), "GEAR"] = 'd3'
df
SPEED GEAR
0 10 d1
1 20 d1
2 30 d2
3 40 d2
4 50 d3

hana_tab = "T_GEAR"
df_remote = dataframe.create_dataframe_from_pandas(connection_context = conn, pandas_df = df,
table_name = hana_tab,
force = True,
replace = False)

df_remote_new = df_remote.select (['SPEED','GEAR',("""CASE WHEN SPEED <=20 THEN 'g1'
WHEN SPEED >20 AND SPEED <=40 THEN 'g2'
WHEN SPEED >40 THEN 'g3'
END""",'GEAR_HANA')])

df_remote_new.collect()

SPEED GEAR GEAR_HANA
0 10 d1 g1
1 20 d1 g1
2 30 d2 g2
3 40 d2 g2
4 50 d3 g3

 

Great to see pivoting from pandas dataframe code towards the HANA dataframe approach! 🙂 Excellent!

andreasforster
Employee
Employee

In one of the open office hours a question came up on creating new features out of the existing columns, without data extraction.

Here is an example, that bundles Finland, Sweden and Norway for example into Scandinavia in a new column

df_remote = conn.table('EMPLOYEE_CHURN_ORG')
df_rem_prepped = df_remote.select(['EMPLOYEE_ID', 'AGE', \
("""CASE WHEN CURRENT_COUNTRY IN ('Sweden', 'Norway', 'Finland') THEN 'Scandinavia'
WHEN CURRENT_COUNTRY = 'UK' THEN 'United Kingdom'
WHEN CURRENT_COUNTRY IS NULL THEN 'Unknown'
ELSE CURRENT_COUNTRY END""", 'REGION_LOCAL')])
df_rem_prepped.select('REGION_LOCAL').distinct().collect()

How to include in result columns from list ls_col =['EMPLOYEE_ID', 'AGE']  instead of writing directly in select?

You can make use of Python function strings like str.format() or f-strings  (python background here) to apply parameters with hana-ml / dataframe methods.

See a hana-ml example here (not exact match to ls_col)

id='####'
hdf_new=hdf_remote.filter('"uuid" = \'{}\''.format(id)).sort('date', desc=False)

 

ls_col =['SPEED', 'GEAR']
#remove [ and ] from statement
df_remote_new = df_remote.select (LS_COL,("""CASE WHEN SPEED <=20 THEN 'g1'
WHEN SPEED >20 AND SPEED <=40 THEN 'g2'
WHEN SPEED >40 THEN 'g3'
END""",'GEAR_HANA'))

# include all fields
df_remote_new = df_remote.select ('*',("""CASE WHEN SPEED <=20 THEN 'g1'
WHEN SPEED >20 AND SPEED <=40 THEN 'g2'
WHEN SPEED >40 THEN 'g3'
END""",'GEAR_HANA'))