This page contains some accompanying examples to Alan Flavell’s “I18n – text direction”. Examples that are supposed to display incorrectly (i.e. not as intended) in either Mozilla or Internet Explorer 6 are in red. Read the source text to understand how it’s done!
You can specify text direction by (paired) Unicode control characters, by (paired) control characters written as numeric references, by HTML markup, or by CSS properties. Control characters are restricted to plain text and are not suitable for use with markup languages (except lrm
and rlm
). The preferred method for HTML is to use HTML markup. Use control characters written as numeric references only in places where no markup is possible, such as attribute values (alt
, title
, etc.). Occasionally it may be convenient to specify text direction via CSS; for example, to set the direction of columns in tables rather than to put a dir
attribute into each and every <td>
.
In the following table, div
represents any block-level element, and span
represents any inline element.
Plain text | HTML 4 | CSS 2 | |
---|---|---|---|
control chars | control chars | markup | properties |
not applicable | not applicable | <div dir=ltr> ...... </div> |
direction: ltr; unicode-bidi: normal |
not applicable | not applicable | <div dir=rtl> ...... </div> |
direction: rtl; unicode-bidi: normal |
U+202A ...... U+202C |
‪ ...... ‬ |
<span dir=ltr> ...... </span> |
direction: ltr; unicode-bidi: embed |
U+202B ...... U+202C |
‫ ...... ‬ |
<span dir=rtl> ...... </span> |
direction: rtl; unicode-bidi: embed |
U+202D ...... U+202C |
‭ ...... ‬ |
<bdo dir=ltr> ...... </bdo> |
direction: ltr; unicode-bidi: bidi-override |
U+202E ...... U+202C |
‮ ...... ‬ |
<bdo dir=rtl> ...... </bdo> |
direction: rtl; unicode-bidi: bidi-override |
U+200E | ‎ |
not applicable | not applicable |
U+200F | ‏ |
not applicable | not applicable |
If the line below is displayed as “12 11 10 9 8 7 6 5 4 3 2 1 0”, then your browser recognizes the dir
attribute and it is probably ready for right-to-left text. Preferably, the line should be right-aligned.
0 1 2 3 4 5 6 7 8 9 10 11 12
The control or formatting characters U+202A to U+202E are not suitable for use with HTML. If they are written directly into the source text, they interfere with the left-to-right markup and make editing or even viewing the source a nightmare. Furthermore, the bidirectional algorithm stops at newlines. It would no longer be possible to structure the source text by newlines, which could separate, for example, the paired U+202B and U+202C.
The closing U+202C or ‬
is sometimes implied and may be omitted like the closing </p>
and </td>
in HTML. Nevertheless, it is safer to close always explicitly.
To write “שבת [שאבעס]”, you can use HTML markup with <span
dir=rtl>
or, exceptionally, write the control characters ‫
and ‬
as numeric references. Inserting the control characters U+202B and U+202C directly results in a mess when viewing the source.
‫<B
lang="he">שבת</b>
[<I>שאבעס</i>]‬
<B
lang="he">שבת</b>
[<I>שאבעס</i>]
Never use UTF-8-encoded control characters, but only character references like ‫
and ‏
.
dir
attributeThree or more directional levels (here: Latin > Hebrew > Latin) must be defined by control characters or, preferably, by HTML markup. The third line has no dir
markup and is thus displayed as having only two directional levels.
The words mean “Congratulations!”
The words “מזל טוב” mean “Congratulations!”
The words “מזל [mazel] טוב [tov]” mean “Congratulations!”
The words “מזל [mazel] טוב [tov]” mean “Congratulations!”
Numbers, which are always written from left to right, are likely to mess with right-to-left text. For example, “12
345
” denote two numbers and should be displayed as “345 12”. On the other hand, “12 345
” denotes a single number and should always be displayed as “12 345”.
The first line is from Google’s Urdu interface with overall dir=rtl
; the second line has proper dir
markup. (Both lines are written in the restricted MacUrdu character set.)
© 2004 Google – 90 00 000 ويب صفحات كى تلاش هو رهى هے
© 2004 Google – 90 00 000 ويب صفحات كى تلاش هو رهى هے
© 2004 Google – 9 000 000 veb safahāt kī talāš ho rahī hai
Always specify the dir
attribute for each piece of text, starting with <body
dir=ltr>
or <body
dir=rtl>
.
bdo
elementTo write Hebrew letters from left to right, you need the bdo
element in addition to the attribute dir=ltr
.
The vowels α ε η ι ο derive from א ה ח י ע, resp.
The vowels α ε η ι ο derive from א ה ח י ע, resp.
The next examples assume a right-to-left context (dir=rtl
) such as an Arabic-language page. The date 31 December 1999 is to be shown in all-numeric form: 1999-12-31. The first line in each example is the one where Internet Explorer 6 fails.
The ASCII hyphen is a European number separator. Therefore, no special markup should be necessary. However, Internet Explorer 6 needs dir=ltr
.
1999-12-31
1999-12-31
The non-breaking hyphen (‑
) is another neutral. Therefore, markup with <bdo
dir=ltr>
is necessary for all browsers.
١٩٩٩‑١٢‑٣١
١٩٩٩‑١٢‑٣١
The traditional Arabic date format calls for the slash as separator and the suffix م (mīlād = birth), meaning “AD”. The slash is a common number separator. Therefore, no special markup should be necessary. However, Internet Explorer 6 needs <bdo
dir=ltr>
.
١٩٩٩/١٢/٣١ م
١٩٩٩/١٢/٣١ م
Use the attribute dir=ltr
with European digits and the tag <bdo
dir=ltr>
with Arabic-Indic digits.
lrm
and rlm
charactersThe left-to-right mark (‎
= ‎
) and the right-to-left mark (‏
= ‏
) are alternative ways to specify the direction of neutral characters such as punctuation marks or spaces. The above examples are rewritten here using ‎
.
The vowels α ε η ι ο derive from א ה ח י ע, resp.
The vowels α ε η ι ο derive from א ה ח י ע, resp.
1999-12-31
1999-12-31
١٩٩٩‑١٢‑٣١
١٩٩٩‑١٢‑٣١
١٩٩٩/١٢/٣١ م
١٩٩٩/١٢/٣١ م
© 2004 Google – 90 00 000 ويب صفحات كى تلاش هو رهى هے
© 2004 Google – 90 00 000 ويب صفحات كى تلاش هو رهى هے
© 2004 Google – 9000000 ويب صفحات كى تلاش هو رهى هے
The second line does not work in Internet Explorer 5, which needs a number without spaces. This example shows that the explicit markup with the dir
attribute is more reliable than the implicit ‎
and ‏
marks.
zwnj
characterThe zero-width non-joiner (‌
= ‌
) is necessary for writing Persian where certain affixes and compound words do not join. It is shown by a hyphen in the transliterated words below.
هفته | hafteh | week |
هفتهها | hafteh-hā | weeks |
هفتهها | haftehhā | wrong |
موزه | mūzeh | museum |
موزهها | mūzeh-hā | museums |
موزهها | mūzehhā | wrong |
سه | seh | three |
سهشنبه | seh-šanbeh | Tuesday |
سهشنبه | sehšanbeh | wrong |
راه | rāh | way, road |
راهآهن | rāh-āhan | railway |
راهآهن | rāh’āhan | wrong |
نرم | narm | soft |
نرمافزار | narm-afzār | software |
نرمافزار | narmāfzār | wrong |
zwj
characterThe zero-width joiner (‍
= ‍
) is necessary to show isolated glyphs of the Arabic letters. At least Mozilla needs it when Arabic letters are separated by HTML markup. (The zero-width joiner does not work with earlier browser versions such as Netscape 7.0 or Internet Explorer 5.)
جسيم | jasīm | gros |
جسام | jisām | gros pl. |
جسيمة | jasīmah | grosse |
جسيمات | jasīmāt | grosses |
أجسم | ajsam | plus gros(se(s)) |
الأجسم | al-ajsam | le plus gros |
الأجاسم | al-ajāsim | les plus gros |
الجسمى | al-jusmā | la plus grosse |
الجسميات | al-jusmayāt | les plus grosses |
ن · س · ت · ع · ل · ي · ق ← ن س ت ع ل ي ق ← نستعليق
ن · س · ت · ع · ل · ي · ق ← ن س ت ع ل ي ق ← نستعليق
On the other hand, Internet Explorer 6 joins letters even when they are separated by markup. Therefore you still need an additional ‌
if the letters shall not join.
سههزار ،
دههزار
سههزار ،
دههزار
The zero-width joiner can also be used to write Urdu text in and for the restricted MacUrdu character set where the two-eyed he (ھ
) is not available.
هفته | haftah | week |
هاته | hāth | wrong |
هاته | hāth | hand |
ديده | dīdah | eye |
دوده | dūdh | wrong |
دوده | dūdh | milk |
The sequence ‍‌
is needed for Sindhi where the initial form of the letter he (ﻫ) is used as consonant, while the connecting form (ﻬ) is reserved for aspiration.
جهنگل | jhangalu | jungle |
گهر | gharu | house |
منهن | munhun | wrong |
منهن | munhun | mouth |
ويه | vīha | wrong |
ويه | vīha | twenty |
Persian word processing / ZWNJ – ZWJ
xx
Andreas Prilop
30 August 2007