plaintextextractor: autodetect encoding for text
#### Autodetect encoding feature 1. Add autodetect encoding feature to `plaintextextractor`. Inspired by a similar algorithm in `KTextEditor`. 2. Also added some test files (plain text files for `win1251`, `gb18030`, `euc-jp` encodings, also test html files with `UTF-16LE` and `win1251` encodings). 3. Manually removing a newline character `\n` AFTER decoding, not before. This is necessary for multi-byte encodings (for encodings with the Little-Endian and Big-Endian byte sequence), which encode a newline character in two or more characters, for example `000A` (2 bytes), where `0A` == `\n` and deleting the character `0A` without processing the accompanying `00` breaks decoding. For example `UTF16-LE`, which eventually has the char `\n` at the beginning of the next line when `readline`, and not at the end of the previous line, because `QFile.readline` reads until it reaches `\n`, so sequence `0A00` breaks on `0A` in the end of first line, and on `00` in the...
Please register or sign in to comment