Showing results for 
Search instead for 
Did you mean: 

Issue with Document Information Extraction



I have been using DOX for 6 months now and have never had to complain about it in the – basic – use I have been making of it.

However, now that I roll up my sleeves to use it for a large amount of invoices, I am facing some intriguing situations that I’d like to share in order to get answers from you :

1) I have built a schema composed of 10 fields and then a template where I annotated 5 to 10 line items. When I test my template in the « Document » section of DOX UI with the same document I used as a sample document in the « Template » section, here is what happens :

PAGE 1 : DOX missed 7 line items out of 50

PAGE 2 : DOX had a 100% success rate

PAGE 3 : DOX missed 17 line items

I then edited the extraction results of page 1 by adding the 7 missing line items. Nevertheless, when I test again on the same document, DOX still misses the 7 line items.

I tried to test again by splitting the three page document into 3 one page documents and the results are the same. Why does DOX not take my modifications into account and why does it give such variable results from one page to another?

2) I noticed that the CONFIRM button has no action when I edit the extraction results, whereas the SAVE button has. It seems weird to me since I remember it was working perfectly well in my « early days » of testing the service. Is there something I am missing here? Is this button meant to be deprecated soon ?

3) Is there any link to a thread or a tutorial explaining how to activate the dataFeedbackCollection configuration ? I dove into the SAP DOX help document as well as the following links here and here but is there a way to activate this feature once for all so the service becomes more accurate each time I upload a document in the « Document » section of the DOX UI ?

Thanks for your help on this.


0 Kudos

Dear Ludovic,

Thanks a lot for your feedback.

(1) Unfortunately the current template feature does not incorporate incremental learning but instead provides static solution. This means that the solution is fixed once 1-2 lines are annotated as minimum requirement. Extending annotation to more lines or more pages would not help.

The future ML based template training might help this case.

(2) "Confirm" will set document to uneditable upon submitting ground truth. What did you expect after editing a doc and pressing confirm?

(3) Data feedback team will reply to you on this feature.

Best regards

Shu Zhen

0 Kudos

Dear 342478 ,

Many thanks for your response.

(1) I understand your point and take note of it. However, given the overall good success rate in my case (82%), is there any way to make it 100% by editing extraction results ? Plus, what can explain the fact that DOX missed lines on pages 1 and 3 but pointed them all out on page 2, knowing each of the 3 pages has exactly the same format ?

Do you have any idea when the future ML based template training will be released ?

(2) I intended to use the CONFIRM button to set the prediction confidence score of all line item fields to 100% and « enable data feedback collection feature to allow documents to be used for retraining » as written in the DOX documentation. But weirdly the button has no action when clicking on it...

(3) OK, waiting for their feedback then.

Kind regards,


0 Kudos

Hi 342478,

I managed to activate the dataFeedbackCollection configuration key I mentioned in point (3). One thing I’d like to understand though :

To be able to activate it with Swagger UI, I need information from the DOX service key, which I can find through the BTP cockpit. However, I noticed that all the document templates I created through Cloud Studio lead me to a DOX UI whose instanceID is different from the one of the DOX UI I can access from the BTP cockpit (I noticed that after clicking the « Change Instance » button of DOX UI). So my question is : is my observation right and if so, where am I supposed to find the service key of this specific DOX service so I can activate the dataFeedbackCollection configuration key of the right instance ?

I do hope this will also fix the issue of the CONFIRM button which does not work for me. Well, to be accurate, it works on the DOX UI accessible from the BTP cockpit (the « Confirm Extracted Values? » pop up window shows up), however it does not work on the DOX UI accessible from any project in Cloud Studio.

If you or your dataFeedbackCollection team could get back to me regarding this issue, that would be very appreciated as it really stops me from moving forward…



0 Kudos

Hi Ludovic,

Apologies to digress from the very interesting exchange with Zhen Shu, will wait for Zhen and dataFeedback team to revert to your open questions.

Meanwhile, am curious to know what is holding us from using the “Extract Data (Pre-Trained model)” activity instead of the approach explained in your question.

As you are already working for 6 months, am sure you might have already explored this but didn’t utilize it for some reason(s). Can you please throw some light. Looking forward for your inputs

Basically , from where I see --this pre-trained model does not need us to “define” the schema. Also as it is newly added one in latest version and previously available similar activity in PDF SDK are now depreciated (maybe there is a need to relook if this will help overcome issues being faced for current use-case where higher number of invoices/deviations come into play)

  • SAP IRPA - Documentation Information Extraction SDK - “Extract Data (Pre-Trained model)” Activity - Output Parameters “extractedData” gives us automatic direct access to all header & line item details of the document (highlighted below for reference). help page

Thanks & Regards,

Vishal Rathi

0 Kudos

Hi hrrms,

Waiting to hear from you if you had any progress/luck on this topic. Thanks in advance!


Vishal Rathi

Accepted Solutions (0)

Answers (0)