Create plain text nodes for HTML and PDF content
This makes extractors work that relied on the implicit type conversion that the old system had special-cased for a few types. Counter-intuitively this has practically no performance impact despite doing the conversion unconditionally: In case the parent type is extracted from, doing the text conversion comes almost for free (ie. the full PDF or HTML parsing is done already), and in case the parent doesn't produce output, content-based matching for plain text extractors will always trigger the type conversion.
parent
66dc0997
Please register or sign in to comment