11 December 2014
> Is there any limit, how many language parameters can be used at the same time?
There is no such limit; you can specify several languages, separated by commas, in the language parameter of any processing method. The recognition service will automatically select the more appropriate language for the document from the specified set.
> Is there some performance penalty for each additional language parameter?
No, you pay only for pages you recognize.
> What will happen if none of the given language parameters matches with the language used in the image/PDF being processed?
The document will be recognized using those languages which you specified; the service will select the most similar language. One recognition language and the more fitting character from the language alphabet will be chosen for each character.
> How accurately the OCR cloud service is able to choose the right language from the given parameters?
The recognition language is selected from the list of languages specified in the language parameter of the processing method. The service provides OCR for the languages with dictionary support and without dictionary support. During the recognition, the text is separated into words, with one or several recognition languages corresponding to each word. If during the process the word is found in several dictionaries the variant of recognition will be selected from a dictionary in which the dictionary’s quality of this word is better. There is the defined dictionary’s quality for each word in each dictionary.
> If I increase the amount of language parameters will that cause OCR cloud service to use wrong language more frequently?
Yes, you understand correctly that each extra language may badly influence the recognition result.
> What happens if the OCR cloud service is trying to use different language with the image/PDF being processed?
Please see the answer to your third question.
> My main problem is that I should be able to process several PDFs in several languages without knowing the exact languages of each document. At the moment I know only the list of languages these documents are written in general. So, should things work ok if I would just process all these PDFs by sending the same list of the languages as parameter for all of those or should I try to determine the exact language of each document beforehand so that I could send the exact language for each PDF as a parameter?
If you have the possibility to determine the language for the document you are going to process, it would be the best decision. The fact is that the speed of recognition processing will be faster and the output result will be more accurate if you specify the exact language for each document.