You will see that I have posted about this before asking for suggestions on which software I can use to convert PDF to docx/odt.

I am a teacher. During my time as a researcher I wrote a lot of documents and regularly draw upon them to teach my students. I often have to take the text, modify them, or build upon them. A lot of my material is bound up in PDFs. Sometimes, I have grant applications to write where a previous draft I wrote was stored as a PDF. Converting them to text has become the bane of my life.

I am forced to use online tools because none of the software I have seem to do the trick. Lot of people keep saying pandoc. Pandoc does not convert PDF to any other format. It can only be the output format.

Is there a magic open source solution that I have missed out?

  • observantTrapezium@lemmy.ca
    link
    fedilink
    English
    arrow-up
    9
    ·
    6 days ago

    I know the pain. While there are definitely solutions that work sometimes, there’s just no “one size fits all” that I’m aware of. PDFs can represent text very differently internally.

    What I did for one project where extracting the text produced a complete mess was to convert the PDF pages to images and then OCR them…