cancel
Showing results for 
Search instead for 
Did you mean: 

SAP BTP Document Information Extraction Service (CA-ML-BDP)

matthewbest
Explorer

Hi,

I have a few questions that I am hoping an expert can help me with.

1. Using SAP BTP Document Information Extraction Service, if I create a Custom Schema, and also Annotate a specific type of template/document (Invoice for example) and then I add additional Invoices with the same layout for prediction/extraction results, but the results are not great. - How many Invoices of the same layout would SAP need in order to retrain/provision the Global Model so that the Accuracy increases for the 100% Custom Schema/Template?

2. Also, are all documents saved in the DOX UI for the lifetime of the service or are they cached somewhere within BTP that can later be queried? Imagine after years of use, the DOX UI would be crowed with potentially thousands of documents.

3. Finally, is it possible for the DOX Service to make accurate extractions if the .pdf file contains multiple documents. For example, if one .pdf file contains 3 different Invoices, can this service accurately extract and separate the 3 different Invoices containing the same layout?

Accepted Solutions (0)

Answers (1)

Answers (1)

tomasz_janasz
Product and Topic Expert
Product and Topic Expert

Hi Matthew,

here are the answers to your questions:

  1. For the Template feature, please note: it is 1:1 relation, i.e. 1 layout - 1 template. You must not add different layouts to the same template. If you notice extraction issues with certain fields please drop us a ticket for CA-ML-BDP. Our experts will evaluate it and get back to you. For the continuous improvement please use the dataForRetraining configuration: https://help.sap.com/docs/document-information-extraction/document-information-extraction/confirm-do...
  2. Documents are stored in the service for max. 30 days and then deleted: https://help.sap.com/docs/document-information-extraction/document-information-extraction/data-prote.... However, please note that the service needs to store the documents that are associated with the template. These documents will not be deleted.
  3. The best practice is to split the documents before sending for extraction.

One more thing, the internal abbreviation is DOX. DIE seems to be somehow peculiar 😉

Regards,

Tomasz (PM)

matthewbest
Explorer
0 Kudos

Hi tomasz.janasz,

Thank you so much for response and clarifications on my above questions and for responding so quickly! I truly appreciate you taking the time to answer.

haha I totally agree and I have updated my question 🙂 Again, thank you!