Skip to content
Commit c26693af authored by Luis Javier Merino's avatar Luis Javier Merino Committed by Tomaz Canabrava
Browse files

Don't strip 0-width Other_Format characters

These include ZWJ (Zero Width Joiner), ZWNJ (Zero Width Non-Joiner) and
Zero Width Space, which can be used to change the rendering of text,
e.g. forcing or preventing the formation of conjunct forms in Indic
scripts.

Treat them as combining characters, so they end up in an extended
character in the previous character cell.

To test, the output of:

printf "[\u915\u94d\u927]  "[\u915\u94d\u200c\u937]  [\u915\u94d\u200d\u937]\n"

can be compared against the examples in Figures 12.4 and 12.5 of the
Unicode standard, from the "Explicit Virama (Halant)" and "Explicit
Half-Consonants" sub-sections of the Devanagari section on "South and
Central Asia I" chapter (page 465 in Unicode 14).
~
parent e7e90100
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment