Issue with number format

bkmarzec · ‎03-28-2023

We are using Document Information Extraction to process Purchase Orders in PDF format. Since we supply the European market, numbers in the documents are specified with a decimal comma, e.g. 12.345,67.

However, we have noticed that the numbers specified in the PDFs are consistently misinterpreted as a decimal point by the Document Information Extractor.

For example, if the PDF contains a deliver quantity of 13.460 in a line item, the Document Information Extractor returns following (13.46):

{
   ...
  "documentType": "purchaseOrder",
  "languageCodes": ["de"],
  ...
  "extraction": {
    "lineItems": [
            [
                {
                    "name": "quantity",
                    "category": "details",
                    "value": 13.46,
                    "rawValue": "13.46",
                    "type": "number",
                    "page": 1,
                    "confidence": 0.9902912621359221,
                    "label": "quantity"
                },
		...

  }
}

whereas I would expect 13460.

I tried to change the template and confirm multiple documents with updated value, but it didn't worked.

Is there any other option to specify or train the number interpretation?

zeise · ‎03-29-2023

Hi Bartlomiej,

can you please open a ticket on the component CA-ML-BDP, briefly describe the context of your scenario and attach one or two example documents so we can investigate the issue?

Best regards,
Manuel

Issue with number format

Re: Comp mime type 'text/plain;charset=utf-8' inst...

Re: Isn't it possible to use "SELECT SINGLE *" for...

Re: What attribute should I use to represent Fiori...

Re: RAP: How to access child information while bei...

Re: RAP: How to access child information while bei...