Limiting the domain with Regular Expression

  • 1.5K Views
  • Last Post 30 March 2012
  • Topic Is Solved
dante posted this 29 March 2012

I just signed up for the SDK and am testing it out with my product. It works pretty well so far, but I have question regarding the regular expression functionality for processTextField.

One of the fields I need to recognize is a name field. These names are actually the names of my users and therefore I know the entire domain of possible entries. I'd like to pass this domain to the SDK to give it a better chance of a positive match. I'm trying to use the regExp parameter of processTextField to accomplish this, but it seems to have no effect on the outcome. I have tried passing

/(name1)|(name2)/i

and

(name1)|(name2)

and even

(name)

none of these seem to have an effect on the outcome. Guidance or suggestions, appreciated!

  • Liked by
  • Vasily Panferov
  • gdelfino
Order By: Standard | Newest | Votes
Vasily Panferov posted this 30 March 2012

Specifying regular expression doesn't actually force recognition engine to always use it in results. There is a possibility to get output completely different to what is specified in regexp.

When recognizer has several hypotheses how to recognize given word, it checks all of them against regular expression. If given recognition variant conforms to regexp, it has higher probability of being selected as final recognition output. But if there is no variant that matches regular expression, the result cannot be conforming to it.

However, there are other options besides regular expression to improve recognition quality. For example, internally there are many specialized dictionaries for names in different languages.

You can send your image snippet to cloudocrsdkbeta@abbyy.com. We'll take a look at it and probably suggest some options or provide something in cloud API to get better results.

dante posted this 30 March 2012

Okay thanks - this makes sense. Just to check, is the first regexp i listed valid - i.e. /(name1)|name2)/i -? If so, I'll use that and then do other checks on my side if it does not come back with a known name.

I'm still just testing, so i don't have very good samples to share, but I'll keep your offer in mind and send once I have better data. Thanks for your help!

Close