2. Tagging an HTML file?

If you open the source code of virtually any HTML file, you will see there are a LOT of tags. So changing the styles manually is just not workable. You need to use another software to tag (prepare) the file. It’s rather easy to do for HTML, and other relatively common formats like XML and SGML. My personal preference goes to a software called Rainbow (freeware). There are other possibilities like +Tools (also freeware).

The process is rather simple and well explained in both software documentations, so I won’t overkill it. In Rainbow, (once installed), you click on “Add”, select the HTML files you need to prepare, go to the Tools menu, select “Prepare for translation”, fill out the needed options, and under the tab “Package”, you select where the tagged files should be created.

Some stuff may look complex, but frankly it’s a no-brainer, when all you have to do is prepare an HTML file.

Find your files, open the rtf file in Word, and you are ready to translate.

3. Translating a tagged file:

This depends on your CAT. In Wordfast, start the translation as usual, with your TM and glossaries, the lock bolt on the door, gaffer tape across the neighbor’s kid mouth, Mozart playing (or AC/DC – your call), …,whatever your set-up usually is when you translate. ;-)

Tags in tw4winInternal are considered as placeables. You can select them in the source segment using “Ctrl + Alt + Left/Right” and “Ctrl + Alt + Down” will copy it inside the target segment, at the insertion point. Type your translation in the target and bring down the tags at the appropriate points in the target sentence.

Use the tags to know how the text will look like and do not hesitate to refer to the original HTML file, when in doubt. As explained, before, keep keywords in mind and balance the text to match the original’s proportions as closely as possible. (Of course, if the page is not meant for the general public but for Intranet, that becomes much less important).

Please refer to the “tagged files” section of your Wordfast’s manual. In summary, you have to make sure that you do not forget tags (Wordfast has settings to remind you), that you keep the internal tags in the tw4winInternal and the translatable text in whatever is the style originally used.


You are translating an <b>HTML</b> file!
Vous êtes en train de traduire un fichier <b>HTML</b> !

4. Done, now, what?

When your translation is done and the file cleaned (meaning all source segments and segment delimiter have been deleted), you have a nice …RTF file. If both the source and the target language do not require Unicode and you do not have special characters in the file, save it as txt (or copy all the code in Notepad) and change the extension to “*.html” or “*.htm” (depending on the original). If you use a language that requires Unicode (Chinese, Japanese, Russian, Thai,...), save the file with the appropriate encoding and modify the charset information in the file header to reflect the new language (i.e.: UTF-8.) See the HTML links to find out more about encodings and file formats.

If you have respected the tags, the file should look about right in the browser. However, the translation is seldom the same size as the original text, and if so, you may have to make a few arrangements to make it fit nice. If lucky, everything can stay the same.

You are through. I hope these information will help you tackling HTML files in a professional manner and feel confident with them. As you can see, there is nothing really hard in HTML files, but they do require some extra attention too. If it's HTML, it's not just text.

At times the client wants you to translate the text with no consideration with the HTML or a potential use on the net. That’s all right. If so, skip everything and ask him to provide a regular *.doc file, or open the HTML in word and save it as *.doc.

[Comming soon...need to look for them] Here are some HTML links that will help you learn more about HTML itself and perhaps get into some more “geeky” file formats like SGML, XML, PHP, ASP, …, you name it.

Good luck. ;-)

