DOX - Process of training an extended Invoice sche...

Pyc · ‎07-20-2023

G'day all,

I've read some questions along the same lines with great responses from Tomasz Janasz but I'm still unclear.

I want to extend the standard Invoice Schema to include a few Australian standard fields (BSB/BPay Biller/BPay Ref).

The out of the box pretrained model seems to do pretty well on a wide variety of Invoices thrown at it. No training required as the name would suggest. Happy days.

But to add in the required additional fields I need create an extension schema and then that needs training.

A few questions?

What are the field level "Default Extractors" and how can I make one? How can I instruct the general logic on how to look for a BSB or BillerCode without disrupting all the good default logic of "Prertained" (knowing that the Template logic still sits on top of Pretrained).

Given that I want to use "Detect Automatically" (because the alternative is unthinkable), how do get my desired result given that "BSB" and "BPayBiller/BPayRef" are mutually exclusive. No invoice would ever have both, but all Australian Invoices will have one or the other.

My concern is that if I start add Templates for some vendors to train for these fields, I'll have to have a Template per vendor - this seems crazy given how well works with no training to begin with. There is very little "I" in the "AI" if thats the case.

I do wonder if a custom field level extractor is the answer, but I haven't found any doco on it, or any real understanding from other answers I've seen.

Thanks for any and all advice!

Have fun,

Mark

tomasz_janasz · ‎07-20-2023

Hi Mark,

current Extension feature is layout-based (Template). In other words: to be able to extract additional custom fields one needs to annotate each distinct layout to show to the system where the key-value pair is localized on the document.

Default Extractors correspond with the fields that are managed by SAP (the ones that we train the models on). It means that you have the option to combine your own manual annotation with the power of the underlying ML algorithms.

"Detect Automatically" helps to find/detect the correct template that corresponds with the layout of the document that you upload to the service. The value is that you need to create a Template and annotate the corresponding (custom) fields only once, and the extraction is then done automatically by the service.

I hope that you will find some "I" in Document Information Extraction 🙂

Feel free to reach out if you have further questions.

Regards,
Tomasz (PM)

DOX - Process of training an extended Invoice schema - country specific

BAPI to Cancel specific line items from billing do...

Re: What is the difference between Length, Decimal...

Re: INSERT JSON Document store Collection not work...

Collaborative MDK Development Best Practices

Data extension Form for Digitally Signed Billing D...