[PopplerExtractorTest] Verify multicolumn PDF content (currently broken)
The PDF content extraction currently uses a text "layout" (see poppler `pdftotext -layout ...`) when extracting the content, i.e. the lines of multiple columns will be interspersed. Add a PDF file which uses multiple columns and contains the required structures to recreate the correct text flow. Unfortunately, there is no simple way to fix this, as the `RawOrderLayout` of `Popper::Page::text(...)` creates even worse output than the default `PhysicalLayout`, (missing spaces between words, or no output at all). Also add the corresponding ODT source document.
Please register or sign in to comment