Don't strip 0-width Other_Format characters
These include ZWJ (Zero Width Joiner), ZWNJ (Zero Width Non-Joiner) and Zero Width Space, which can be used to change the rendering of text, e.g. forcing or preventing the formation of conjunct forms in Indic scripts. Treat them as combining characters, so they end up in an extended character in the previous character cell. To test, the output of: printf "[\u915\u94d\u927] "[\u915\u94d\u200c\u937] [\u915\u94d\u200d\u937]\n" can be compared against the examples in Figures 12.4 and 12.5 of the Unicode standard, from the "Explicit Virama (Halant)" and "Explicit Half-Consonants" sub-sections of the Devanagari section on "South and Central Asia I" chapter (page 465 in Unicode 14). ~
parent
e7e90100
Please register or sign in to comment