cancel
Showing results for 
Search instead for 
Did you mean: 

Data Privacy Embedding Model via Core AI

Reinhardt
Discoverer

Hi,

We have been on top of all the exciting new AI developments in the SAP space, notably:
- SAP AI Core
- SAP AI Launchpad
- HANA Vector Engine

We're working internally on some Proof of Concept applications - however, a big question we can't seem to find a definitive answer on is data privacy. Specifically, using embedding models.

Open AI contacted us explaining that they don't utilize data submitted through their APIs to improve or train their models, and that they only retain data up to a maximum of 30 days in some legal scenarios.

The idea is of course to utilize Core AI for all our AI needs, being prompting and embedding. Is there a difference in privacy when utilizing Open AI embedding through the SAP Core AI vs directly through their own APIs?

Does SAP have additional security agreements with LLMs that apply when using them through Core AI?

Is our internal data (and customer data) safe when submitting it to embedding models via Core AI?

We want to vectorize some internal data with confidence regarding privacy - this will most assuredly be a question from customers too, when developing AI applications.

Kind Regards,

Reinhardt

View Entire Topic
MarioDeFelipe
Contributor

Hi @Reinhardt 

SAP OpenAI embeddings uses Langchain.OpenAI embeddings class, so we will not avoid that when creating embeddings using OpenAIEmbeddings, the text data will be sent to OpenAI's Embeddings API no matter how.

Similarly, when querying the HANA vectorstore created from the embeddings, relevant excerpts from your private documents may be included in the prompts sent to OpenAI as part of the query. So portions of your private data can end up being sent to OpenAI's API during queries as well.

If data privacy is a major concern, an alternative is to use a local embedding model instead of relying on OpenAI's API. Embedding models are not a big thing and can be easily deployed on BTP as well. 

this course is really good and allows us to deploy a pre-trained model (as an embeddings model) on BTP

https://developers.sap.com/tutorials/ai-core-tensorflow-byod.html

Always be local and don't use external APIs if data privacy is a concern, the responsibility of data privacy is always ours as you mention.

Reinhardt
Discoverer
0 Kudos

Hi Mario,

Thank you very much for you detailed response. Data privacy will be at the forefront of concern when developing grounded AI applications in the coming years. We are looking into deploying a local embedding model for now, as you mentioned.

Much Appreciated,
Reinhardt