Technology Blogs by Members
Explore a vibrant mix of technical expertise, industry insights, and tech buzz in member blogs covering SAP products, technology, and events. Get in the mix!
cancel
Showing results for 
Search instead for 
Did you mean: 
sourabh_sharma6
Explorer

Step by Step process for developing Data Science python scripts by using SAP HANA Database on Cloud Platform.


Overview


SAP Cloud Platform is an open platform-as-a-service (PaaS) that delivers in-memory capabilities, core platform services, and unique micro services for building and extending intelligent, mobile-enabled cloud applications.

Data Science is the process of deriving knowledge and insights from a huge and diverse set of data through organizing, processing and analyzing the data.

Python is a dynamic, interpreted (byte-code-compiled) language. There are no type declarations of variables, parameters, functions, or methods in source code. This makes the code short and flexible, and you lose the compile-time type checking of the source code.

DISCLAIMER:Please note that the resources and the data used is for demonstration purpose only.

We will be developing a simple python script illustrating data graphically using data science packages like panda, matplotlib and pyhdb by opening data base tunnel to SAP HANA Cloud Platform.

  • PYHDB is a pure Python client package for the SAP HANA Database based on the SAP HANA Database SQL Command Network Protocol.

  • MATPLOTLIB is a Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms.

  • PANDAS is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.


Prerequisites:

Lets start the development now:

  • Open data base tunnel to SAP HANA Cloud Database

    • Open command prompt and enter command to change the current directory to refer to the neo.sh file for the downloaded SDK. Replace username with your workstation name.
      cd C:\Users\username\Desktop\PY\SDK\tools​


    • Now enter below connection string to open a database tunnel to cloud.Replace username,databasename and password with your HANA trial account username,databasename and password.
      neo open-db-tunnel -h hanatrial.ondemand.com -a usernametrial -u username -i databasename -p password​


    • Congratulations you have successfully opened a database tunnel.




Lets upload sample data to HANA cloud using SAP HANA studio:

 

Its time for python development

  • Open Python IDE and create a new file

  • Below is the code for connecting to the database and performing data analysis operations on the fetched data:Replace username and password with the database username and password for your MDC database instance.
    import pyhdb
    import pandas as pd
    import matplotlib.pyplot as plt
    import matplotlib
    connection = pyhdb.connect('localhost', 30015, 'username', 'password')
    cursor = connection.cursor()
    cursor.execute("SELECT top 20 DATE, HIGH FROM SAP_HANA_DEMO.NIFTY_50_DATA")
    a = cursor.fetchall()
    data = pd.DataFrame(a)
    matplotlib.rcParams['axes.unicode_minus'] = False
    fig, ax = plt.subplots()
    ax.plot(data[1], data[0], 'o')
    ax.set_title('NIFTY-50')
    plt.show()



  • Connection is established using connect function from pyhdb package by passing server credentials.

  • We are fetching top 20 records from table NIFTY_50_DATA and converting it into dataframe using DataFrame method from pandas packages.

  • At last scatter plot is displayed using package matplotlib.


Lets test the developed script

  • Run the python script by press F5.

  • Below scatter plot is generated showing variations of days highest price with respect to the date.


Congratulations you have successfully visualized data in python using SAP HANA Cloud Platform.Please note that we can develop perform complex scripts for analyzing the data based on the requirements.
4 Comments
Former Member
0 Kudos
Hi,

 

I'm trying to do exactly the same thing but when I had the trial version, I was able to connect to DB from pyhdb without opening a tunnel specifically from CLI but used to keep the DB open in Eclipse application. So, there were no issues with that.

 

But now I have a MDC version, and not able to open the tunnel.

Do you know any way to connect to non-trial version of HANA DB? When I go to open db tunnel, it throws me error saying "Database or schema '___________' not found."

 

Thank you.
sourabh_sharma6
Explorer
0 Kudos
Hi David,

Ideally the connection string for connecting to Non-trial HANA MDC should be the same as trial.

I tried simulating the error you are facing.It seems you are passing either blank or incorrect schema name in open-db-tunnel command.

neo open-db-tunnel -h hanatrial.ondemand.com -a usernametrial -u username 

-i schemaname/databasename -p password

Please replace the schemaname/databasename with your schemaID/DB.

However if you are still facing the issues you might need to explore for setting proxy.
0 Kudos
Hi,

Your tutorial just worked out of the box. All the way to the python script. I just had a string to float error on the date field. But I guess I just need to format before forwarding to matplotlib.

 

Kind Regards

Michael P.
sourabh_sharma6
Explorer
0 Kudos
Hi,

For resolving this issue ,one of the solution is while importing data using CSV file to SAP HANA Studio, you can change the data type for Date field to NVARCHAR.
Labels in this area