on 11-28-2020 4:33 PM
Dear all,
We tried to create a simple training pipeline(in trial account) as in above image using the Python producer template, but we get an error as below.
"Graph failure: operator.com.sap.system.python3Operator:python3operator1: Error while executing callback registered on port(s) ['input']: could not convert string to float: 'RL' [file '/home/vflow/.local/lib/python3.6/site-packages/numpy/core/_asarray.py"
We are not sure how to convert the data(string to float) as said in error message(highlighted in bold).
Correction: The data has a mix of numeric and string columns from the file in GCP cloud storage. The data read operator seems to work but fails on the first python operator.
The output is string type in 'toString Converter' operator, the input is string in python operator and its output is blob.
We tried 'toNumberConverter' after 'toString Converter' operator, but it it is not suitable as the input to python operator is 'string' and cannot be connected. We also tried all documents and searched for answer but couldn't find one, hope someone can help.
PS: the code and docker file is given below(docker file created/activated successfully).
Thank you.
Chitra T
# Code in Python operator
def on_input(data):
import pandas as pd
import numpy as np
from io import StringIO
import io
from sklearn.linear_model import LinearRegression
df_data = pd.read_csv(io.StringIO(data), sep=",")
# x = df_data[["HALFMARATHON_MINUTES"]]
# y_true = df_data["MARATHON_MINUTES"]
X = df_data.drop(['SalePrice'],axis=1)
Y = df_data['SalePrice']
lm = LinearRegression()
lm.fit(X, Y)
y_pred = lm.predict(X)
mse = np.mean((y_pred - Y)**2)
rmse = np.sqrt(mse)
rmse = round(rmse, 2)
# create & send the model blob to the output port -Artifact Producer operator will use this to persist# the model and create an artifact ID
import pickle
model_blob = pickle.dumps(lm)
api.send("modelBlob", model_blob)
api.set_port_callback("input", on_input)
*******************************
DOCKER
*******************************
FROM $com.sap.sles.base
RUN python3.6 -m pip --no-cache-dir install --user pandas
RUN python3.6 -m pip --no-cache-dir install --user numpy
RUN python3.6 -m pip --no-cache-dir install --user sklearn
To me, it does not look like a problem with the input port, but the processing within Python code.
It is hard just from reading, but I bet there is a non-numerical column(s) in the input dataset assigned to `X`. Try removing then before supplying the dataset to the Linear Regression's fit().
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Thank you Witalij, you are correct . We encoded the non-numerical column and it works fine. Cheers.
User | Count |
---|---|
84 | |
9 | |
9 | |
8 | |
7 | |
7 | |
6 | |
5 | |
5 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.