Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 

Introduction


SAP Intelligent Robotic Process Automation provides convenient and smart solutions to simplify the extraction of data from documents. Document Information Extraction,  an SAP AI Business Services capability has been integrated into SAP Intelligent RPA from September 2021 to enable the users with the flexibility of choosing between different information extraction options. Further information on integration touchpoints can be viewed in this blog.

This blog post is part of Document Extraction series which aims at empowering the community with detailed step by step guides by explaining the capabilities of Document Extraction within SAP Intelligent RPA. It will use a sample extraction use-case to showcase the capability of rule-based capturing of information from the document.

Prerequisites



  • SAP Intelligent Robotic Process Automation platform (Trial / Full-Version)

  • Installation as per the instructions in Help Portal

  • Knowledge about Projects, Automation. Tutorials can be found under: Tutorials


Set-up



  • Create Project in the Cloud Studio.

  • Add following dependencies in your project as shown:

    • Document Information Extraction SDK

    • PDF SDK
      Please Note: 
      Core SDK and Excel SDK will be added automatically when an automation is created.




Sample Document:



 

Business Use-Case


In the corporate world, there are numerous types of documents which needs to be processed to obtain the business information. Such documents can be different than the generic types such as Invoice, Purchase Order or Payment Advice. Such custom document types are difficult to be automated using pre-build procedures.

Let's look onto a scenario where a company receives multiple Power of Attorney documents from different associates. Company also maintains a database to manage the associates by storing the complete document text, Shipper Number, Exporter Identification Number and Date.

We will simplify and realize this use-case by using the new "Open Document (Online OCR)" activity along with some pre-existing activities.

Steps to simplify this use-case



  1. Create Automation.

  2. Drag and drop the Open Document (Online OCR) activity. This activity can open machine readable or scanned documents in PDF or Image formatProvide the document path as shown in the image.



  3. To grab the complete text in a document, Get Text (PDF) can be used. Drag and drop this activity.

  4. Drag and drop the Get Text After (PDF) activity. This activity allows users to fetch the text after a specified search string. The number of words to be extracted can be controlled using the numWords parameter.Provide the search string and number of words parameters as shown in the image.

  5. Similarly, add Get Text After (PDF) activities for Exporter Identification Number and Date as shown below:

  6. Since we added the required steps in our automation, we can now add 2 log activities to view the result.

    You can put the following messages in the log to view the extracted fields:
    "Power of Attorney complete: " + Step2.textContent

    "Shipper: " + Step3.outputValue + " EIN: " + Step4.outputValue + " Date: " + Step5.outputValue

     

  7. Test the automation to view the extraction result. The result should be visible in the Test Console.


More Activities


All the existing activities in the below mentioned blogs can be used with the new activity Open Document (Online OCR).

Blog1 Blog2

The difference lies in the first activity Open Document (Online OCR) or Open PDF. Open PDF can only opens machine readable PDF's and does not work with scanned images or documents. With the new activity Open Document (Online OCR), you can open scanned images or documents as well.

Conclusion


By reading this blog post, you have learned about the new Open Document (Online OCR) activity and its usage. In addition to that, you got a basic overview of how simple and convenient activities can be used to extract information from the documents.

Thanks for reading and feel free to leave a comment with questions or feedback 🙂

 

Find more information on SAP Intelligent RPA:


Exchange knowledge: SAP Community | Q&A | Blog

Learn more: Webinars | Help Portal | openSAP

Explore: Product Information | Successful Use Cases

Try SAP Intelligent RPA for Free: Trial Version | Pre-built Bots

Follow us on: LinkedInTwitter and YouTube
12 Comments
morris91
Explorer
0 Kudos
Hi and thanks for this guide.
I've got some issue at step 2. Providing the documentPath for document to open and testing the automation, I get this error:

"Could not upload document for information extraction: 400 "Invalid client ID(s). The provided client ID(s) does/do not exist."".

I'm sure that the path is correct, any suggestion?
0 Kudos
Hi,

Thanks for trying out the steps. It seems that the problem is with the subscription. Can u confirm the subscription plan if it is TRIAL or OTHER?

Thanks,
Simar
morris91
Explorer
0 Kudos
Yes, I've got a trial plan.

Thanks,
Maurizio
0 Kudos

If possible, you can try to unsubscribe and re-subscribe the SAP Intelligent Robotic Process application. Please note that this will delete the data in your current tenant.

Otherwise, you can create a support ticket: here

Thanks,
Simar

morris91
Explorer
With an account of a colleague it work perfectly instead with mine, even deleting my trial account and creating another one, I get the same error. So at this point I'll create a support ticket.

Thanks,
Maurizio

 
will_conlon
Product and Topic Expert
Product and Topic Expert
Thank you singh.simar!! Much appreciated!!
nikhilbansal
Explorer
0 Kudos

Hi morris91

I am facing the same issue and it is not working for my colleague as well.

Can you please suggest if you got any resolution from the Support team?

morris91
Explorer
0 Kudos
Hi,
I'm sorry but I've not contacted Support team anymore
0 Kudos
Hello,

Sorry for the inconvienience. The issue should be fixed now with the latest hotfix.

Thanks,
Simar
morris91
Explorer
0 Kudos
Yes it is fixed!

 

Thanks a lot
rogier1234
Member
0 Kudos
Hi Simardeep and Maurizio!

I am trying automated invoice extraction with IRPA and have been getting the exact same error:

(Could not upload document for information extraction: 400 "Invalid client ID(s). The provided client ID(s) does/do not exist.").

I have tried re-subscribing to the SAP IRPA Application and all other applications in the process, made a new trial account and repeated the steps etc. and still get the same error. Of course I have also checked the client id's that I put in my JSON file, they are correct. Do you guys have any solution or reason what causes this?

Regards
Rogier
melanie_lauber
Active Participant
0 Kudos
I am facing the same issue. Done everything fresh from scratch, but I get the exact same error. Any more hints on this?