Automation of AP Invoices using Cloud OCR SDK

  • 1.2K Views
  • Last Post 14 August 2015
philhawthorn posted this 10 August 2015

I am currently evaluating your Cloud OCR SDK for the viability of automating AP Invoices in our application in Salesforce.

I am successfully using the processImage method to upload the image and receive back the document in XML which gives me coordinates and all of the text etc. What I would like to know is if it is possible to build some templates to 'recognise' certain fields. As you know, Invoices come in all shapes and sizes and I would like to cut down on the amount of human intervention. For example, 2 invoices from the same supplier have similar, but not exact coordinates for text fields, and of course it also depends on how many product lines are invoiced as to where things end up on the page.

I looked at the processFields and processTextFields methods, but I have to supply coordinates and as I mentioned, these aren't reliable enough to pin-point a particular field, in addition to that it would be a case of creating a different template for each supplier.

So, my questions are these:

  1. Is it realistic to implement my own templating type functionality? Would that generally be expected functionality?
  2. If so, is it possible to use some kind of tolerance for the coordinates - or do they have to be exact?
  3. Or, is this too ambitious a goal given the methods available in the Cloud OCR SDK currently?

Thanks!

Order By: Standard | Newest | Votes
Oksana Serdyuk posted this 11 August 2015

I've forwarded your contact info to my colleagues from our office located in your region. They will contact you soon to discuss your questions and find individual approach to the issue.

Jamesw posted this 13 August 2015

I am attempting the same thing. Can I discuss with someone, or can an answer be put here?

Thanks

Oksana Serdyuk posted this 14 August 2015

My colleagues from your region office will contact you soon to find out your usage scenario details.

philhawthorn posted this 14 August 2015

Here is what I have learnt so far, in answer to my own questions:

Q: Is it realistic to implement my own templating type functionality? Would that generally be expected functionality?

A: Not really realistic, the effort involved would not be practical. I have been advised there are other ABBYY technology (FlexiLayouts?) that achieves this, but are not available via the Cloud OCR SDK.

Q: If so, is it possible to use some kind of tolerance for the coordinates - or do they have to be exact?

A: Not applicable, given answer to first.

Q: Or, is this too ambitious a goal given the methods available in the Cloud OCR SDK currently?

A: From what I have been led to believe, currently I would conclude yes it is too ambitious.

philhawthorn posted this 14 August 2015

Added my thoughts/findings below @Jamesw

Jamesw posted this 14 August 2015

Thanks for the reply. I've also done some thinking on this.

FlexiLayouts is of course a good feature that would be great for cloud, but for now I believe that it's not too ambitious to "template" a supplier's layout in a very basic way. However, it's worth noting that our solution is not for a one-off customer, so we are willing to throw more resources at it.

From a business point of view, the values extracted would still need verification, for example, before posting to your accounts package- regardless of the accuracy of the OCR. Therefore, you would ALWAYS be providing a "best efforts" solution to the accounting dept.

The templates themselves would require a well-designed, highly responsive interface to allow a new template to be made in minutes by the more illiterate users. Click-and-drag selection areas, specify which data is at each selected area. Save, done.

  • Liked by
  • philhawthorn
philhawthorn posted this 14 August 2015

I agree to some extent, I can certainly have users 'click' on the Invoice #, Invoice Total etc. but I wanted more automation than that. I played with 2 invoices from the same supplier, whilst the pixel locations were similar they were not identical so even locating fields by pixel location would involve some effort...then you'd have to do this for each different format. Maybe I'm making too many assumptions about what FlexiLayouts would give you, but I assumed that it was more intelligent than that. Also, can you confirm which platform you're developing on? Thanks

Close