Smart Document Extraction using SAP Intelligent RP...

former_member659766 · ‎10-01-2021

Introduction

One of the most important aspect of automating the business processes is to extract data reliably from documents such as Invoices, Purchase Orders, Payment Advices or other custom documents. In a traditional workflow, information trapped in such documents has to be read and understood manually in a tedious manner affecting time and cost for any organization.

One of such example is shown below where John who manages the accounts has to manually go through the hustle of downloading, reading and storing the invoice data into system.

With the integration of SAP Intelligent RPA with Document Information Extraction service, John can now relax and perform the tedious and hectic tasks by click of a button as shown in the picture below.

Document Information extraction service from SAP is part of SAP AI Business Services portfolio offering. Being a pre-trained service, it leverages deep-learning algorithms to extract structured semantical information from unstructured documents, at the same time, specialized models are available for the most common document types to provide even better extraction results and additional capabilities.

What makes it smart and easy ?

SAP Intelligent RPA allows you to extract data from scanned documents, images in a user-friendly and Low Code-No Code manner with the help of dedicated activities.

John has some questions regarding the data extraction from documents. Let's have a look to those questions:

Q1. I have multiple invoices in multiple languages or structure and generic fields such as Invoice Number, Purchase Order Number, Date, amount and so on.. which needs to be extracted. Can you extract these fields with just specifying the Document Type and Document path ?

The answer is YES. We have an activity which can be used by just doing a drag and drop from the activities list.

Extract data without template activity will provide an output with all the generic data which is extracted using the global ML model of Document Information Extraction service. The output data can be used as shown in the image below where all the generic fields are listed. Value as well as the confidence score obtained from the ML approach is also available to create a logic around it.

Q2. I am able to extract the fields using the above approach, but the Receiver Name of the invoice is not extracted. Is there a way that I can annotate just one field and others get extracted automatically?

The answer is YES. We have a convenient way of declaring fields which cannot be extracted by the global model and needs user annotations to recognize for the further documents. This can be done by first declaring a Document Template which is shown below:

Provide the sample document along with few other parameters to start the annotations. After pressing the create button, it will jump to an annotations UI as shown below where you can annotate the fields which you think cannot be extracted automatically.

In the above example, only one field is annotated and other fields fields will be extracted automatically using the Global ML approach. Once the annotations are done, this template can be used in the automations by doing a drag and drop of another convenient activity as shown below:

Select the document template from the list of available template and provide document path to extract the data.

Q3. I have some random documents other than Invoice, Purchase Order or Payment Advice and would like to extract the text. Do you support text extraction is such case and provide activities to fetch the information from such documents?

The answer is YES. We have a dedicated activity which is an alternative to the existing activity called "Open PDF". Previously the text operations can only be performed on machine readable PDF's but with the help of new activity called "Open Document (Online OCR)", any scanned document, images are also supported. These activities can be seen under the PDF SDK module.

License

Document Information Extraction feature in SAP Intelligent RPA is available for all the customers without additional costs or license.

This feature is already available as part of 2109 release.

What's next?

With the help of Document Information Extraction, SAP Intelligent RPA will provide a set of convenient activities to enrich the information extracted from documents with your own master data records.

Thanks for reading and feel free to leave a comment with questions or feedback 🙂