Skip to content
Commit dea29a32 authored by Volker Krause's avatar Volker Krause
Browse files

Create plain text nodes for HTML and PDF content

This makes extractors work that relied on the implicit type conversion
that the old system had special-cased for a few types.

Counter-intuitively this has practically no performance impact despite
doing the conversion unconditionally: In case the parent type is extracted
from, doing the text conversion comes almost for free (ie. the full PDF
or HTML parsing is done already), and in case the parent doesn't produce
output, content-based matching for plain text extractors will always
trigger the type conversion.
parent 66dc0997
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment