29 November 2017
The recognized text is presented in proper hierarchy: document > page > block > region > etc. Please see the Output XML Document article for detailed description of the XML output.
As the XML output also includes the coordinates of each element, you can parse the words by coordinates on your side and extract the necessary data from the output.
The basic idea is that the field value and the field name would be close to each other in your invoices. Therefore, you could, for example, search the XML for the keywords like the field names, get those keywords’ coordinates and then find other text blocks with the field values situated somewhere near those keywords (right below them, on the same level to the right, etc.). If you need to find the words from one line, you can use the baseline coordinate of the words. If the baseline coordinates are close (+/- several pixels), the words are from the same line.
Additionally, you could check potential field values against appropriate regular expressions (e.g. date format for dates, words starting with capital letters for names, etc.) to accept or reject different variants.