Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
maria_bezrukova
Advisor
Advisor
We are happy to announce a new model of Document Information Extraction for invoice-extraction!

If you missed the recent blog post on the new model for purchase orders, read it here!

Document Information Extraction


SAP AI Business services offer Document Information Extraction as part of its portfolio. The service is available through the Cloud Platform Enterprise Agreement (CPEA) and also in the Pay-As-You-Go (PAYGO) model. Using this service, users are able to extract information from various types of business documents, for example, invoices, purchase orders, and payment advice using pre-trained AI models.

After introducing the new model for purchase orders, Document Information Extraction now offers the new model for the invoices as well. Read more about the new model for purchase orders in our recent blog here and find out about the background of charmer models!

Better Model for Extraction of Invoice Documents


This blog presents the new improved pre-trained model for Invoice extraction. The new model allows the users to have increased robustness and have much better extraction accuracy for almost all header fields (for example: PO numbers, tax IDs, address types separation). The consistency on line-item extraction is also improved, yielding better results on long tables.

What is New?


The new model has some significant improvements for the header fields. For example the totalAmount and invoiceNo as central entities on an invoice show significant improvements.

On the test data, the new model makes 25% fewer errors for the total amount compared to the old model, and for the invoice number almost every second error, that the previous model used to made, is eliminated.

The vendor tax ID and bank account numbers also improved significantly and have a new cleaning so that the enrichment step can better identify the sender of the invoice (=vendor).

The new post-processing for amounts improves the results especially for customers in markets such as Germany, Spain, or Korea - as it features a better identification of decimal separators and thousand separators. When facing amounts like “1.000” (1 or 1000?) the model now analyses holistically the other amounts on the document, takes into account additional information like currency or country and is nonetheless robust to allow for minor inconsistencies that can appear on real-world documents.

In the following example, the post-processing correctly identified the dot as decimal separator for all numbers and still handled the (inconsistent) decimal separator in the total amount correctly:


Last but not least, the new post-processing logic is now also active for dates with an improved detection of the day/month order in ambiguous dates (e.g., 01/04/2023 vs. 04/01/2023) and will help our customers on non-US documents. As for the amounts, the model now analyses all other dates and evaluates side info like currency or country to better solve ambiguous cases.

How does it help?


Previously, the Chargrid model with a vision-based approach operating on pixel information was the main workhorse to process business documents.

It is now replaced with our new, transformer-based Charmer model to unlock a new level of extraction accuracy. In addition, the new model gives more credible confidence scores for its predictions and even has a reduced resource footprint.

As usual, our customers using the Document Information Extraction service embedded in SAP Central Invoice Management, SAP Concur Invoice or SAP Business One will automatically benefit from the new model’s higher accuracy in the form of higher automation rates and less manual corrections.

Do you have any questions left on this subject? Put them in the comments!

Follow the Tag Document Information Extraction to never miss out on the newest updates from Document Information Extraction!

 




Learn more


Read more about the news of Document Information Extraction on the help portal!

What is Document Information Extraction?


Document Information Extraction is one of the SAP AI Business Services on the SAP Business Technology Platform (SAP BTP). This ML-enabled service is available through the Cloud Platform Enterprise Agreement (CPEA) and also in the Pay-As-You-Go (PAYGO) model.

Tutorials & Learnings:



Blog posts:



SAP Community Page:


2 Comments
peter_munt4
Participant
0 Kudos
Hi

We've just installed in DEV the SAP Central Invoice Management (SAP CIM) and it has by default the OCR Document Information Extraction by SAP AI Business Service and an optional custom OCR and information extraction service.   

Are we meant to incorporate what you have mentioned here into the SAP CIM and so how can SAP CIM benefit from this - are there any specific guides on this for SAP CIM?  
tomasz_janasz
Product and Topic Expert
Product and Topic Expert
0 Kudos

Hi Peter,

I am sorry for the late handling of your question. The SAP-managed AI for supplier invoices that is natively integrated into CIM has a "global" nature. Once improved it brings benefits for all customers and stakeholders (like Ariba CIM or SAP Concur) who utilize Document Information Extraction. There is no need for you to configure anything. You should be able to lean back and enjoy the improvements 🙂

Best regards,
Tomasz (PM for Document Information Extraction)