Translating PO files HOWTO

G. Mohanty

Revision 0.3: 18 August 2004

Introduction

This document is aimed at translators using PO files produced by the GNU gettext utilities. In the style of Linux HowTo documentation, this aims to be concise, and will avoid discussing details. While the examples are in Oriya, the advice is intended to be useful for translators working in any language.

What is needed to work as a translator?

The only absolute requirement is to have an interest in being involved in the process of making an open-source Oriya computing platform available. Given an enthusiasm for this, everything else can be worked around.

This section lists software and other tools needed to work as a translator. It is divided into two parts: the first listing essential pre-requisites, and the second things that are recommended, but are not indispensable.

Essential requirements

While excellent translation is an art rather than science, we are not aiming at creating literary masterpieces. You can work as a translator if you have a good vocabulary in at least one of Oriya or English, with a decent grasp of Oriya grammar. While the amount of time that you spend on this project is, of course, at your discretion, we ask that you commit to an average of roughly an hour a day, or at least five hours a week. If you do not have sufficient time to devote to translation, you can still act as a reviewer, whose job it is to go over translations made by other people and check for grammar, readability, consistency, etc. Typically a reviewer will be more fluent in Oriya than the average translator. Finally, if you are not conversant with Oriya, or have even less time to spare, you can still take part as a tester that runs the actual program under an Oriya language setting, looking for things like contextual errors.

The bare minimum of software that you will need includes an Oriya font, and an editor that allows you to edit Oriya text and save in UTF-8. We strongly recommend using the OpenType Oriya font available on the project homepage [1], and the yudit Unicode text editor [2]. Support for other editors cannot be guaranteed. Installation packages including yudit, fonts, and some documentation, are available on a CD for Linux, Windows 95/98, Windows XP, and Windows 2000. The Windows packages include an automatic installer, and one is in the works for Linux. At present, for installing under Linux, please read the primer on getting started [3]. It is also possible to write up translations directly on a printed copy of a PO file, but given our limited manpower for data entry tasks, we would prefer that you learn to type in Oriya using one of the packages mentioned above.

In order to ensure consistency among different translators, you will also need a copy of the Rebati glossary [4]. The glossary should be strictly followed, and in the rare event that you do need to deviate from the glossary, you should try to communicate this fact. Finally, you should be agreeable to having all your translation work made available under the open-source GNU Public Licence (GPL). We also ask that you assign copyright of your work to the Free Software Foundation that is better equipped to deal with any legal issues that might arise.

Recommended stuff

Besides the pre-requisites above, we ask all translators to join the project mailing list [5]. List archives [6] are open to the public. Electronic communication is an essential tool for a far-flung group like ours, and one of the first needs for the longevity of the project is to develop a thriving online community. Likewise, it would be helpful if people would also volunteer to share in the burden of organizational work for the project.

The first thing to do is to read some documentation so as to get a grasp of what is involved. Ideally, you should also at least skim through the gettext tutorial [7] and the resources noted in the translation plan [8]. The rest of this document assumes that you know what PO and POT files are, and understand some gettext basics. You can also read through some other such translation HowTos [9,10].

At least in the beginning, you are bound to come across some terms that are not in the glossary. Thus, we recommend that you also equip yourself with an English-Oriya dictionary. A pretty good, inexpensive, one is available from the Orissa State Bureau of Textbook Preparation [11]. When adding new words, remember to choose carefully for comprehensibility, and to communicate your choice of a new term that should find its way into the glossary.

Some people find it useful to start from Hindi or Bengali translations of the PO files that they are working on. In this case, you will also need to install under yudit the Hindi and/or Bengali fonts on the CD (see Sec. 3). Finally, if you plan to work as a tester, you will also need to install the development version of GNOME. As this might be unstable at times, we strongly recommend that it be installed alongside your existing GNOME desktop, rather than replacing it. While there is a pretty nice setup for downloading and installing the development version of GNOME, discussion of that process is outside the scope of this document.

Some yudit tips

It might not be immediately obvious, but one can also type in Oriya at the yudit command line at the bottom of the editor window, using the same keyboard layout as in the main window. To do that, replace the line
  yudit.command.font=default
with,
  yudit.command.font=TrueType
in the yudit.properties configuration file.1 You can check that this works by doing a Find (Alt-Q) which will position the cursor in the command area. Now, you can switch between Oriya input (normally, F2) and English (straight) input (normally, F1). You can search for Oriya text uing Find.

yudit also has a find-and-replace function, though it is not accessible via the menu. Click in the command area, and type

replace old-text new-text
The old text is highlighted as it is found, and hitting <return> does the replacement. A second <return> takes you to the next occurrence of the old text, if any. Thus, you can fix a common misspelling in Oriya with, ``replace r]p rZp''.

Introduction to PO files

PO files typically start with several comment strings, followed by a header entry, and then by entries to be translated. Each entry, including the header entry consists of a msgid/ msgstr pair. The header entry is special, and is described below (Sec. 4.2).

For all entries other than the header, the msgid line contains the double-quoted English string to be translated, and the Oriya translation is to be entered between the double quotes on the corresponding msgstr line. Both msgid and msgstr strings may be split over several lines, but each line needs to be double-quoted, and there should be no blank lines between multiple lines that are part of the same string.

Each comment line starts with the `#' character, and there are two types of comments: the first having some whitespace immediately after the `#', and the second having some non-whitespace character after the `#'. The first kind are added exclusively by the translator, while the second kind are typically created and maintained by the GNU gettext utilities, though in some cases the translator might need to modify them.

Special comments

Comment lines beginning with `#,' have a special meaning, as they provide a comma-separated list of flags that provide the translator with some additional information. Two of the most sommonly encountered flags are `fuzzy' and `c-format' and you might see them combined on a single line as ``#, fuzzy, c-format''. The `fuzzy' flag can be generated by the gettext utilities, or can be inserted by the translator, and serves as a warning that this entry might not have been correctly translated. The `c-format' flag, or its opposite, the `no-c-format' flag should only be added and modified automatically by the gettext utilities. The `c-format' flag indicates that the untranslated string is supposed to be a C format string for printf, sprintf, etc., so that any components of the string such as \n, %s, %d, %i, %ld, etc., should be retained as is. Conversely, the `no-c-format' flag indicates that the entry is not a C format string so that the % has no special meaning.

Format for PO file header

Though there are no absolute guidelines for what belongs in a PO file header, we have decided to follow the convention adopted by the GNOME Translation Project. Most of the information here comes from a discussion in July 2004 on the GNOME internationalization mailing list [12].

gettext provides a template for the header when it first generates a PO file, which looks like

  # SOME DESCRIPTIVE TITLE.
  # Copyright (C) YEAR THE PACKAGE'S COPYRIGHT HOLDER
  # This file is distributed under the same license as the PACKAGE package.
  # FIRST AUTHOR <EMAIL ADDRESS>, YEAR.
  #
  #, fuzzy
  msgid ""
  msgstr ""
  "Project-Id-Version: PACKAGE VERSION\n"
  "Report-Msgid-Bugs-To: \n"
  "POT-Creation-Date: 2004-07-21 05:06+0200\n"
  "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
  "Last-Translator: FULL NAME <EMAIL ADDRESS>\n"
  "Language-Team: LANGUAGE <LL li org>\n"
  "MIME-Version: 1.0\n"
  "Content-Type: text/plain; charset=CHARSET\n"
  "Content-Transfer-Encoding: 8bit\n"
  "Plural-Forms: PLURALEXPRESSION\n"
All uppercase entried are supposed to be replaced by appropriate values. You should also remember to remove the `fuzzy' flag from the header once you are finished modifying it.

Each section of the header has the following meaning:

Finally, here is a complete PO file header from localizing gedit.

  # Oriya translation of gedit.HEAD.pot.
  # This file is distributed under the same license as the gedit package.
  # Copyright (C) 2004 Free Software Foundation, Inc.
  # Gora Mohanty <gora_mohanty@yahoo.co.in>, 2004. 
  #
  msgid ""
  msgstr ""
  "Project-Id-Version: gedit.HEAD.or\n"
  "Report-Msgid-Bugs-To: \n"
  "POT-Creation-Date: 2004-07-03 04:15+0200\n"
  "PO-Revision-Date: 2004-07-22 17:20+0530\n"
  "Last-Translator: Gora Mohanty <gora_mohanty@yahoo.co.in>\n"
  "Language-Team: Oriya <oriya-group@lists.sarovar.org>\n"
  "MIME-Version: 1.0\n"
  "Content-Type: text/plain; charset=UTF-8\n"
  "Content-Transfer-Encoding: 8bit\n"
  "Plural-Forms: nplurals=2; plural=( n != 1 );\n"

Some common stumbling blocks

Here is a list of some commonly encountered problems with the first few translations that were reviewed:

Bibliography

1
An Oriya font developed by Rajesh Pradhan and Andy White,
http://oriya.sarovar.org/downloads/utkalm.ttf.gz,
The file is gzipped.

2
The yudit Unicode text editor,
http://www.yudit.org.

3
G. Mohanty,
A practical primer for using Oriya under Linux, v0.3,
http://oriya.sarovar.org/docs/getting_started/index.html, 2004.

4
An English to Oriya glossary of commonly used terms in the Rebati project,
http://oriya.sarovar.org/download/rebati-glossary.pdf.gz.

5
The Rebati project mailing list,
http://lists.sarovar.org/cgi-bin/mailman/listinfo/oriya-group,
Subscribe at the above URL. Due to a bug in the sarovar.org setup, please use only the web interface to reply to the confirmation request sent to you after you ask to subscribe.

6
The archives of the mailing list for the Rebati project,
http://lists.sarovar.org/pipermail/oriya-group,
Archives are publicly available, i.e., you do not have to subscribe to the list in order to read the archives. Currently, the archives are updated daily, but that will probably change to a monthly update once the volume of messages go up.

7
G. Mohanty,
A tutorial on Native Language Support using GNU gettext, v0.2,
http://oriya.sarovar.org/docs/gettext/index.html, 2004.

8
G. Mohanty,
A plan for Oriya localization, v0.1,
http://oriya.sarovar.org/docs/translation_plan/index.html, 2004.

9
A simplified translators HowTo from the IndLinux project Wiki,
http://www.indlinux.org/wiki/index.php/TranslatorsHowto.

10
A translators guide from the Ankur (Bengali Linux) project,
http://tldp.org/HOWTO/Bangla-HOWTO/devguide.html#transguide.

11
T. Mishra, J. B. Mohanty, B. Nanda, and S. K. Mund, editors,
Bureau's English-Oriya dictionary,
Orissa State Bureau of Text Book Preparation and Production, 2000.

12
The GNOME internationalization mailing list archives,
http://mail.gnome.org/archives/gnome-i18n/.

About this document ...

Translating PO files HOWTO

This document was generated using the LaTeX2HTML translator Version 2002-2-1 (1.70)

Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999, Ross Moore, Mathematics Department, Macquarie University, Sydney.

The command line arguments were:
latex2html -no_math -html_version 4.0,math,unicode,i18n,tables -mkdir -dir memo -split 0 memo

The translation was initiated by root on 2004-08-18


Footnotes

... file.1
Under Linux, this file is in ~/.yudit for the individual user, and the system file is in /usr/share/yudit/config. For Windows, the corresponding locations are C: \HOME \.yudit and C: \Program Files \Yudit \config, assuming standard installations for both operating systems.
root 2004-08-18