Skip to content
Commit f9f483e6 authored by Stefan Brüns's avatar Stefan Brüns
Browse files

[PopplerExtractorTest] Verify multicolumn PDF content (currently broken)

The PDF content extraction currently uses a text "layout" (see
poppler `pdftotext -layout ...`) when extracting the content, i.e.
the lines of multiple columns will be interspersed.

Add a PDF file which uses multiple columns and contains the required
structures to recreate the correct text flow.

Unfortunately, there is no simple way to fix this, as the
`RawOrderLayout` of `Popper::Page::text(...)` creates even worse
output than the default `PhysicalLayout`, (missing spaces between words,
or no output at all).

Also add the corresponding ODT source document.
parent 76f229da
Pipeline #645376 passed with stage
in 5 minutes and 24 seconds
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment