Understanding and writing RTL pages

"RTL" is the problem of getting your website to look right to a speaker of Arabic or Hebrew, where the writing goes from right to left. To see what that looks like, here's an excerpt from the Universal Declaration of Human Rights, translated1 to Hebrew.
כל בני האדם נולדו בני חורין ושווים בערכם ובזכויותיהם. כולם חוננו בתבונה ובמצפון, לפיכך חובה עליהם לנהוג איש ברעהו ברוח של אחווה.
In Hebrew, the understanding of the UI idioms "left" and "right" are also mirrored. To RTL speakers, "left" means "forward" and "right" means "backward". So the typical website with its navigation area on the left, and its content on the right, looks wrong.
A surprising aspect of Hebrew (and Arabic) is that although they are written right to left, numbers are still written left to right:
בשנת 1885 נישא לאלן לואיז אקסון שמתה בשנת 1914.
If you open your page inspector, you'll see that I've set the attribute dir="rtl" on the <blockquote>s containing Hebrew text. This makes the magic possible: it sets the base direction of the content. The base direction informs the Unicode Bidirection Algorithm2 of the expected orientation of the text, which is to say the instinctive reading order of the expected audience. (It also affects how HTML layout is done, as we will see). If I reproduce the examples without dir="rtl" we get:
כל בני האדם נולדו בני חורין ושווים בערכם ובזכויותיהם. כולם חוננו בתבונה ובמצפון, לפיכך חובה עליהם לנהוג איש ברעהו ברוח של אחווה.
בשנת 1885 נישא לאלן לואיז אקסון שמתה בשנת 1914.
The only thing that is affected in these examples (ignoring the alignment for now) is the position of the (last) full stop. For larger and more complicated texts there are worse outcomes for not setting the right base direction. To understand why, we have to understand the basics of the BiDi algorithm.

A little BiDi algorithm

There is no substitute for the spec, so to begin let's ponder:
Each character has an implicit bidirectional type. The bidirectional types left-to-right and right-to-left are called strong types, and characters of those types are called strong directional characters. The bidirectional types associated with numbers are called weak types, and characters of those types are called weak directional characters. With the exception of the directional formatting characters, the remaining bidirectional types and characters are called neutral. The algorithm uses the implicit bidirectional types of the characters in a text to arrive at a reasonable display ordering for text.
There are three flavors of bidirectional types, strong left, strong right, weak, and neutral. Using "<", ">", "w", and "-" respectively, let's annotate an example:
'מזל טוב' means 'hello'.
-<<<-<<<-->>>>>-->>>>>-w
The text above is in logical order, also called storage order, which is the order of the codepoints as stored in memory or disk. The job of the BiDi algorithm is to figure out the visual order, given the intrinsic directionality of the characters and the base direction.
The real meat and potatoes of the algorithm are resolving weak and neutral characters, "Resolving Implicit Levels" and "Reordering Resolved Levels".

Resolving weak and neutral characters

Strong characters have a given orientation, while weak and neutral types do not. This step resolves those types into strong types. The only weak character in our input is ., which by rule W6 "Otherwise, separators and terminators change to Other Neutral" is converted to a neutral character.
'מזל טוב' means 'hello'.
-<<<-<<<-->>>>>-->>>>>--
N2 "Any remaining NIs take the embedding direction." is the next applicable rule, since the neutral character is not surrounded (on both sides) by strong characters — so it takes the base direction, which is <. N2 also applies to the sequences 8-9 and 22.
'מזל טוב' means 'hello'.
-<<<-<<<<<>>>>>-->>>>><<
The neutral sequences at positions 0, 4, and 15-16 are surrounded by strong characters of one direction, so by rule N1 assume that direction.
'מזל טוב' means 'hello'.
<<<<<<<<<<>>>>>>>>>>>><<

Resolving implicit levels

Now that all characters have been assigned a strong directionality, the next part of the algorithm sorts out runs of < and > into separate embedding levels (EL for short). In more complicated examples, explicit formatting characters can have introduced ELs before this point. But for this simple string, we can simply say that the sequence of > characters are put in a higher EL than the rest of the string.

Reordering resolved levels

The reordering rules are L1-L4, but for this example only L2 applies, which specifies that all sequences in an EL and higher are reversed, for all ELs but the zeroth. In our string, this just ends up reversing the sequence at EL 1, which is the English part of the string. Then the entire string is reversed because the base direction is RTL, and we end up with:
'מזל טוב' means 'hello'.

Other interesting little bits

The Unicode Character Database is a great resource for discovering interesting properties about Unicode characters. BidiMirroring.txt lists characters which serve as mirrors for each other when characters have been reversed for the purposes of BiDi ordering. For example, the GREATER-THAN SIGN and LESS-THAN SIGN are listed as mirrors:
003C; 003E # LESS-THAN SIGN
003E; 003C # GREATER-THAN SIGN
Here is an example of BiDi mirroring in action — both <blockquote>s contain just ">":
>
>
The UCD also has a BidiBrackets.txt which describes which pairs of characters open and close paired bracket contexts, which solve certain issues in RTL contexts such as:
max(x, y)
If you see "max(x, y)" then your browser has implemented BidiBrackets. At the time of writing, BidiBrackets are a new spec so you are unlikely to see the correct output.
If you liked this digression you will love this BiDi utility which lets you see exactly how the algorithm applies to any input text.

dir="rtl" saves the day

Now that we understand BiDi better let's come back to making RTL webpages.
The common way to employ dir="rtl" is to inspect the Accept-Language HTTP request header and apply dir="rtl" to the <html> or <body> element is the language is either3 Arabic (ISO_639-1 ar) or Hebrew (ISO_639-1 he). The dir attribute is inherited by the element's children4 so it is sufficient to set it once on <html> or <html>. Let's see what Twitter does:
$ curl -sH "Accept-language: he" https://twitter.com/ | grep 'dir="rtl"'
dir="rtl">
In general, setting dir="rtl"the content direction changes the flow to begin from right alignment and build leftwards. For example, table columns are sorted RTL:
12
12
inline-block (but not inline) elements are flowed RTL:
inlineblock1
inlineblock2
inlineblock3
inlineblock1
inlineblock2
inlineblock3
inline1inline2inline3
inline1inline2inline3
Scrollbars are positioned on the left:
test
test
Among other things. I haven't been able to find end-all documentation on what dir="rtl" affects. In fact, the CSS 2.1 spec has this to say about it: "The list of features affected by 'direction' is not meant to be exclusive."
In a fully-supporting browser, <html dir="rtl"> should also render the title text in the window correctly, and reorient the browser chrome. However, as of 2011, many browsers didn't apply the BiDi algorithm to <title>s5. The workaround suggested by W3C is to insert the Unicode directionality characters in the title text. This should be the only place where you allow Unicode orientation hints in your content.

dir="rtl" does not save the day

Setting the base direction for your content will solve many or all of your RTL problems if your website is simple or if you had a tremendous amount of foresight. But, you will have likely introduced positioned elements or asymetries through float, position, background-position, margin, padding, or other properties. Unfortunately, setting the base direction doesn't change how those properties are interpreted:
padding-right: 1em;test
padding-right: 1em;test
There are several ways you could approach writing RTL-robust CSS. One fast way is to use Google Closure Stylesheets, which has an --output-orientation option6, which rewrites these properties for you — margin-left: 5px becomes margin-right: 5px. For properties which you don't want rewritten there is an annotation /* @noflip */.
Another option is to use a CSS preprocessor like LESS, Sass, or Stylus and create a system of mixins whereby the content direction (and its inverse) are exposed as constants7 (example is in LESS):
@dir: right;
@dirOp: left;
(In this case, the content direction is right-to-left.) Then, you create a helper for each property which takes a direction and emits the proper CSS:
.margin(left, @length) 
    margin-left: @length;
}
.margin(right, @length) 
    margin-right: @length;
}
This has the downsides that you will need to rewrite a lot of your CSS, and to be sane you will probably want to cook up a linter so that you aren't accidentally introducing bare margin properties.

Future Developments

HTML5 introduces8 a couple tools for dealing with RTL.
dir=auto9 sets the content direction to whatever strong directionality can be found in the (first N) characters the content, or LTR if none were found. This could be useful in cases where the content may be LTR or RTL and you don't have a way of knowing in advance which it could be. This heuristic is very weak, however, because it doesn't consider the intent of the speaker: was he trying to mix Hebrew into English or English into Hebrew? If you can do better than that, you should avoid dir=auto.
The new <bdi> element lets you isolate the directionality of some content, in the spirit of the Unicode LRI/RLI/FSI formatting characters. So, it would be possible to avoid all Unicode directionality formatting characters, but not all browsers support it at time of writing, which makes this element useless.

Suggested Further Reading

My treatment here isn't complete, if you really want to master this topic I suggest at least reading:

Footnotes

  1. http://www.omniglot.com/udhr/afroasiatic.htm
  2. http://www.unicode.org/reports/tr9/
  3. http://en.wikipedia.org/wiki/Right-to-left
  4. http://www.w3.org/TR/html401/struct/dirlang.html#h-8.2.2
  5. https://www.w3.org/International/tests/html-css/bidi-chrome/results-bidi-chrome#pagetitle
  6. http://code.google.com/p/closure-stylesheets/#RTL_Flipping
  7. http://stackoverflow.com/questions/5121584/less-how-to-insert-an-variable-into-property-as-opposed-to-the-value
  8. http://annevankesteren.nl/2010/11/html5-bidirectional-text
  9. http://www.whatwg.org/specs/web-apps/current-work/multipage/elements.html#attr-dir-auto