Transmission Line Noise

Mul­ti­lin­gual web sites

Tue, Jun 7, 2011

I have not seen another one and prob­a­bly for the fol­low­ing rea­son. In spite of what it says on the Sim­plePO web site trans­la­tors do not like and often will not work on side by side trans­la­tion sys­tems as shown above.

That is how pro­gram­mers imag­ine trans­la­tors will work and it is flawed. Trans­la­tors work with a toolkit called TMX, Trans­la­tion Mem­ory Exchange (generic name see Okapi an open source imple­men­ta­tion to get a feel for it), and in this they build up trans­la­tion dic­tio­nar­ies for words, phrases and sen­tences. They the take a file of vary­ing for­mats and feed it into the TMX soft­ware, this gives them a first pass that is 60%, 70% etc trans­lated but like Google lan­guage API hor­ri­bly man­gled in terms of mak­ing sense in the tar­get language.

Then what they do is trans­late the words not dealt with by TMX, adding to the dic­tio­nar­ies where log­i­cal, and they col­lo­qui­alise it, i.e. make it work in the tar­get lan­guage and make sense of it. For this rea­son the trans­la­tor should always be trans­lat­ing into their native language.

They do it this way for a num­ber of rea­sons, a) it makes sense and works and reduces their work load and b) because they get paid by the word and to do side by side trans­la­tion does not allow them to use their tools and max­imise their income.

What trans­la­tors want is a file in a for­mat that you can export, they can import and trans­late, export and send back to you to import.

The files for­mats can be csv, rtf, tmx, xliff, get­text and if you read the Sym­fony frame­work docs you can see how they do it and han­dle it (they do a pretty good job in my opinion).

Hav­ing said all that i was in a sim­i­lar posi­tion about 8 years ago when hav­ing to write a site in Eng­lish, French, Ger­man, Hun­gar­ian and Slo­va­kian and i did the same as Sim­plPO and sim­ply wrote my own side by side appli­ca­tion to allow this to be done. How­ever the com­pany we were writ­ing the appli­ca­tion for did all their own trans­la­tion in house so we didn’t hit the prob­lem with trans­la­tors. When we did we wrote an export to RTF and import from RTF (that in itself is mind bog­gling) so the trans­la­tors could func­tion as above.

How­ever Sim­plePO is the only other imple­men­ta­tion of the idea i have seen. The frame­works such as Zend seem to think you just cre­ate lookup tags to replace words and phrases and build no con­trol into the appli­ca­tion to man­age the process. Con­se­quently it soon gets out of hand and the main­te­nance of it becomes both dif­fi­cult and expensive.

Most peo­ple who write mul­ti­lin­gual web sites actu­ally don’t. They write a mas­ter site and then make a copy, trans­late it and main­tain the trans­lated ver­sion. It seems clunky to us log­i­cal types but is actu­ally very effective.

One of the rea­sons it is effec­tive is the i18n and l10n are about many other things than language.

  • Look and feel. Anglo sax­ons like cool colours and san serif type­faces, His­panic peo­ples like Serif type faces and brighter colours. And as you cross other cul­tures the expec­ta­tions vary wildly in lay­out, types, colours etc.
  • French and to some degree Ger­man is 30% longer, more ver­bose, than the equiv­a­lent Eng­lish so you lay­out goes to hell in a hand bas­ket real quick.
  • Semitic lan­guage run right to left
  • Japan­ese and other lan­guage that are not alpha­bet based can run ltr rtl top to bot­tom and some do not even have white space
  • dates? US, Japan­ese, UK, Hun­gar­ian as all different
  • cur­rency and num­ber for­mats, don’t even start me off

Well sorry to go on and to sum­marise:- For sim­ple side to side just write it your­self, took me about two weeks with­out any frame­works and work­ing it out as i went along just use tag replace­ment. But any more and con­sider what you are doing. Carefully.