Preserving White Space

  • Last Post 07 April 2012
peetj posted this 05 April 2012

The content appears to be translated correctly, however white space is somewhat truncated. Is there a way of preserving the white space. I am converting a song sheet which is in a 2 line format: chords on one line, words on the next then repeat. The space between the chords is not being preserved.



Order By: Standard | Newest | Votes
Nikolay_Kh posted this 05 April 2012

Hey Pete, please provide a sample image so we could get a better understanding of your task. Thanks!

Vasily Panferov posted this 05 April 2012

And please describe your use case. Which output format do you use? text, docx, pdf, xml? What do you do with the results after that?

The obvious answer is to use xml and check line coordinates but it may not fit well into your workflow.

peetj posted this 05 April 2012

I would if I could upload it - I'm being told my karma < 60 or something whatever that means.

The use case is simple. Upload a pdf chord sheet and send back text. An example would be:

Notice that on the chord lines that there is whitespace between the chords:

       A                             Dsus4         D/A    A

Give me one more chance, and you'll be satisfied

I need this whitespace preserved otherwise I will need to find another solution



Nikolay_Kh posted this 05 April 2012

You can upload image as soon as you verify your email (you can send verification letter from your profile page).

Currently, our plaintext export wipes out multiple spaces. I've added a feature request for disabling that option.

Meanwhile, depending on your task, you can use two approaches:

  1. Add the option exportFormat=rtf to processImage call, rich text format would preserve the spacing of original image.

  2. Alternatively, use exportFormat=xml, the resulting file would contain coordinates of every characters, so you're able to reconstruct document layout the way you want.

peetj posted this 05 April 2012

Thanks for your reply. I tried the rtf method. That worked quite well although it put several chord lines at the end of a word line instead of on their own line. I also tried xml. This seems like overkill as it positions each letter. It would be better if it positioned each word. I think that this service is only worth paying for if you are giving me 99% of what I need. I'm happy for a 1% error to occur. Cheers


Nikolay_Kh posted this 06 April 2012

Thanks for your feedback, Pete! I'll add word coordinates for the feature request list and let you know when it's available. Meanwhile, why don't you try using docx or pdfSearchable export formats? They may offer even more accurate positioning of text.

peetj posted this 07 April 2012

Thanks for your reply. I need the export format to be in text as we convert to ChordPro - a text format for storing songs. I could cope with html like the hOCR format as I could easily transform that. But as I said before the chords need to be positioned exactly right which comes down to whitespace