Create plain text nodes for HTML and PDF content (dea29a32) · Commits · PIM / KItinerary

Commit dea29a32 authored Mar 26, 2021 by

Volker Krause

Create plain text nodes for HTML and PDF content

This makes extractors work that relied on the implicit type conversion
that the old system had special-cased for a few types.

Counter-intuitively this has practically no performance impact despite
doing the conversion unconditionally: In case the parent type is extracted
from, doing the text conversion comes almost for free (ie. the full PDF
or HTML parsing is done already), and in case the parent doesn't produce
output, content-based matching for plain text extractors will always
trigger the type conversion.

parent 66dc0997

Hide whitespace changes

Inline Side-by-side

Please register or to comment