Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
JonGooding
Advisor
Advisor

Extending on my fitness related datasets and using some more 2024 based capabilities, I will not go into details on how I get it into Datasphere, as I covered this previously with a Travel Sustainability Blog

Digging into further analysis of the data, using some newer technologies and datasets, I want to focus on a couple of areas:

  1. Finding Similar Activities using the HANA Vector Engine
  2. Using External Datasets to enrich data, and in my case find areas of concern using HANA Spatial Engine (not new but always good to use newer datasets). 
  3. Restricting data between users using SAP Data Access Controls.

Also additional SAP Analytics Cloud examples developed will be demo'ed as part of a Webinar that will be linked to this blog. 

With a bit more details:

  1. Finding Similar Events using the HANA Vector Engine

Using a dataset of about 2000 different activities, the goal was to find similar efforts, irrespective of location. Here is a sample of a few records to get an idea of the dataset:

Sample Activity DatasetSample Activity Dataset

 Using the columns: Elapsed Time, Distance. Relative Effort and Elevation Gain - I created vectors in the HANA Cloud based on these values:

 

ALTER TABLE LOCAL_ACTIVITIES ADD (EMBEDDING REAL_VECTOR); 

UPDATE LOCAL_ACTIVITIES SET EMBEDDING = TO_REAL_VECTOR('[
'||to_varchar("Elapsed Time")||',
'||to_varchar("Distance")||',
'||to_varchar("Relative Effort")||',
'||to_varchar("Elevation Gain")||']')
WHERE "Elapsed Time" IS NOT NULL
AND "Distance" IS NOT NULL
AND "Relative Effort" IS NOT NULL;

 

 

Then creating a view on this, which can be used as a view / SQL query in Datasphere:

 

CREATE VIEW RELATED_ACTIVITIES_VECTOR_V as
SELECT 
A."Activity Date" AS "SourceActivityDate", B."Activity Date"  AS "TargetActivityDate", A."Activity Name" AS  "SourceActivityName",
B."Activity Name" as "TargetActivityName", A."Distance" AS  "SourceDistance",
B."Distance" AS  "TargetDistance", A."Relative Effort"  AS "SourceRelativeEffort", B."Relative Effort" AS  "TargetRelativeEffort",
A."Elevation Gain"  AS "SourceElevationGain", B."Elevation Gain" AS  "TargetElevationGain",
cosine_similarity(A.EMBEDDING, B.EMBEDDING) COS,
l2distance(A.EMBEDDING, B.EMBEDDING) L2
FROM LOCAL_ACTIVITIES A, LOCAL_ACTIVITIES B
WHERE A."Activity ID" > B."Activity ID"
AND cosine_similarity(A.EMBEDDING, B.EMBEDDING) > .999999
AND l2distance(A.EMBEDDING, B.EMBEDDING) < 100;

 

Then making the query available in Datasphere and presenting the results in SAP Analytics Cloud:

HANA Vector Query Output - in SAP Analytics CloudHANA Vector Query Output - in SAP Analytics Cloud

2. Using External Datasets to enrich data, and in my case find areas of Concern using HANA Spatial Engine. 

In a separate Datasphere space, we have a range of Global Crisis Datasets. For me and my fitness data, I thought the simplest would be determining if I ran near any fires as part of my run. Yes, this data is hindsight - but either way an interesting use case. we easily build a BTP Build App that is preventative to the user, but I'll leave that for someone else in my team to build. 

So picking Queensland and New South Wales fire data, and joining them together:

Linking QLD and NSW Fire datasetsLinking QLD and NSW Fire datasets

In a relatively simple query, I can perform geospatial analysis query based on location and time:

 

SELECT distinct "L_ROW", "Activity ID", "ACTIVITY_DATE", "Activity Type", "SOURCE", "DESCRIPTION", TO_INTEGER("GEO".ST_TRANSFORM( 3857 ).ST_Distance("ST_POINT_3857", 'kilometer')) "Distance_KM", 
    "A"."LATITUDE" "ACTIVITY_LATITUDE", 
    "A"."LONGITUDE" "ACTIVITY_LONGITUDE", 
    "C"."LATITUDE" "CRISIS_LATITUDE", 
    "C"."LONGITUDE" "CRISIS_LONGITUDE"
FROM "INITIAL" "A", "BTP_CRISIS.Crisis_Dataset" "C"
where "GEO".ST_TRANSFORM( 3857 ).ST_Distance("ST_POINT_3857", 'kilometer') < 2
and "ACTIVITY_DATE" BETWEEN add_days("PUBLISHED", -1) AND add_days("PUBLISHED", 1)

 

This query is finding all the fire warnings that were published within 1 day, and within 2 km of my start point of the run. I could have also made these parameters, so that the user can also dynamically choose the timescale and the distance. I could also have used the query to be based on any point on the run. Maybe as a V2.

concernedview.png

Once, I enable this data to be viewed in SAC, I can quickly get the output:

Local Concerns from my Activity locationLocal Concerns from my Activity location

3. Restricting data between users using SAP Data Access Controls.

The data I am loading into the SAP Datasphere instance is multi-user and there is a requirement to keep it isolated across users. So to ensure data privacy, the Datasphere Data Access Controls 

The datasets is unique by user of the Strava Id, so we can simply apply Data Access Controls to limit to their own datasets. For this, a reference table / view is needed, and we simply create a Data Access Control over based on the requirements:

Creating Data Access Controls in DatasphereCreating Data Access Controls in Datasphere

Applying the Data Access Controls to the appropriate Views:

Apply to the ViewApply to the View

 

Linking the Data Access ControlLinking the Data Access Control

 

Now we have a platform that allows us to performs some rich data analysis using modern tools that suit the individual - using their own datasets. Some examples that have been used are around Jupyter Notebooks, DBeaver Analysis and also using text input the Vector engine to match this to the existing datasets. More to come in future blogs and Webinars. 

Personally, I think it's so much more interesting using you own datasets, in this case a personal fitness dataset - or in the business world , your own systems data - which can easily be supplemented into this technology to gain some invaluable insights. 

If you have any other scenarios, let me know and I'll see if I - or we can cover it as part of our next Workshop on this. 

Cheers

Jon

4 Comments