cancel
Showing results for 
Search instead for 
Did you mean: 

Graph failure - could not convert string to float

0 Kudos

Dear all,

We tried to create a simple training pipeline(in trial account) as in above image using the Python producer template, but we get an error as below.

"Graph failure: operator.com.sap.system.python3Operator:python3operator1: Error while executing callback registered on port(s) ['input']: could not convert string to float: 'RL' [file '/home/vflow/.local/lib/python3.6/site-packages/numpy/core/_asarray.py"

We are not sure how to convert the data(string to float) as said in error message(highlighted in bold).

Correction: The data has a mix of numeric and string columns from the file in GCP cloud storage. The data read operator seems to work but fails on the first python operator.

The output is string type in 'toString Converter' operator, the input is string in python operator and its output is blob.

We tried 'toNumberConverter' after 'toString Converter' operator, but it it is not suitable as the input to python operator is 'string' and cannot be connected. We also tried all documents and searched for answer but couldn't find one, hope someone can help.

PS: the code and docker file is given below(docker file created/activated successfully).

Thank you.

Chitra T

# Code in Python operator
def on_input(data):
    
    import pandas as pd
    import numpy as np
    from io import StringIO
    import io
    from sklearn.linear_model import LinearRegression
    df_data = pd.read_csv(io.StringIO(data), sep=",")

    # x = df_data[["HALFMARATHON_MINUTES"]]
    # y_true = df_data["MARATHON_MINUTES"]
    X = df_data.drop(['SalePrice'],axis=1) 
    Y = df_data['SalePrice']
        
    lm = LinearRegression()
    lm.fit(X, Y)
    
    y_pred = lm.predict(X)
    mse = np.mean((y_pred - Y)**2)
    rmse = np.sqrt(mse)
    rmse = round(rmse, 2)

# create & send the model blob to the output port -Artifact Producer operator will use this to persist# the model and create an artifact ID
  
  import pickle
    model_blob = pickle.dumps(lm)
    api.send("modelBlob", model_blob)
    
  api.set_port_callback("input", on_input)



*******************************
          DOCKER
*******************************
FROM $com.sap.sles.base
RUN python3.6 -m pip --no-cache-dir install --user pandas
RUN python3.6 -m pip --no-cache-dir install --user numpy
RUN python3.6 -m pip --no-cache-dir install --user sklearn


0 Kudos

Here is the file used as in the link below

storage-houseprice-train.txt

Accepted Solutions (1)

Accepted Solutions (1)

Vitaliy-R
Developer Advocate
Developer Advocate
0 Kudos

To me, it does not look like a problem with the input port, but the processing within Python code.

It is hard just from reading, but I bet there is a non-numerical column(s) in the input dataset assigned to `X`. Try removing then before supplying the dataset to the Linear Regression's fit().

Thank you Witalij, you are correct . We encoded the non-numerical column and it works fine. Cheers.

Answers (0)