Translating PO files HOWTO
G. Mohanty
Revision 0.3: 18 August 2004
This document is aimed at translators using PO files produced by the GNU
gettext utilities. In the style of Linux HowTo documentation, this aims to be
concise, and will avoid discussing details. While the examples are in Oriya,
the advice is intended to be useful for translators working in any language.
The only absolute requirement is to have an interest in being involved in the
process of making an open-source Oriya computing platform available. Given an
enthusiasm for this, everything else can be worked around.
This section lists software and other tools needed to work as a translator. It
is divided into two parts: the first listing essential pre-requisites, and the
second things that are recommended, but are not indispensable.
While excellent translation is an art rather than science, we are not aiming
at creating literary masterpieces. You can work as a translator if you have a
good vocabulary in at least one of Oriya or English, with a decent grasp of
Oriya grammar. While the amount of time that you spend on this project is, of
course, at your discretion, we ask that you commit to an average of roughly an
hour a day, or at least five hours a week. If you do not have sufficient time
to devote to translation, you can still act as a reviewer, whose job it is to
go over translations made by other people and check for grammar, readability,
consistency, etc. Typically a reviewer will be more fluent in Oriya than the
average translator. Finally, if you are not conversant with Oriya, or have
even less time to spare, you can still take part as a tester that runs the
actual program under an Oriya language setting, looking for things like
contextual errors.
The bare minimum of software that you will need includes an Oriya font, and an
editor that allows you to edit Oriya text and save in UTF-8. We strongly
recommend using the OpenType Oriya font available on the project
homepage [1], and the yudit Unicode text
editor [2]. Support for other editors cannot be
guaranteed. Installation packages including yudit, fonts, and some
documentation, are available on a CD for Linux, Windows 95/98, Windows XP, and
Windows 2000. The Windows packages include an automatic installer, and one is
in the works for Linux. At present, for installing under Linux, please read the
primer on getting started [3]. It is also possible to
write up translations directly on a printed copy of a PO file, but given our
limited manpower for data entry tasks, we would prefer that you learn to type
in Oriya using one of the packages mentioned above.
In order to ensure consistency among different translators, you will also need
a copy of the Rebati glossary [4]. The glossary should
be strictly followed, and in the rare event that you do need to deviate from
the glossary, you should try to communicate this fact. Finally, you should be
agreeable to having all your translation work made available under the
open-source GNU Public Licence (GPL). We also ask that you assign copyright of
your work to the Free Software Foundation that is better equipped to deal with
any legal issues that might arise.
Besides the pre-requisites above, we ask all translators to join the project
mailing list [5]. List
archives [6] are open to the public. Electronic
communication is an essential tool for a far-flung group like ours, and one of
the first needs for the longevity of the project is to develop a thriving
online community. Likewise, it would be helpful if people would also volunteer
to share in the burden of organizational work for the project.
The first thing to do is to read some documentation so as to get a grasp of
what is involved. Ideally, you should also at least skim through the gettext
tutorial [7] and the resources noted in the translation
plan [8]. The rest of this document assumes that you
know what PO and POT files are, and understand some gettext basics. You can
also read through some other such translation
HowTos [9,10].
At least in the beginning, you are bound to come across some terms that are
not in the glossary. Thus, we recommend that you also equip yourself with an
English-Oriya dictionary. A pretty good, inexpensive, one is available from
the Orissa State Bureau of Textbook Preparation [11]. When adding
new words, remember to choose carefully for comprehensibility, and to
communicate your choice of a new term that should find its way into the
glossary.
Some people find it useful to start from Hindi or Bengali translations of the
PO files that they are working on. In this case, you will also need to install
under yudit the Hindi and/or Bengali fonts on the CD (see
Sec. 3). Finally, if you plan to work as a tester, you will
also need to install the development version of GNOME. As this might be
unstable at times, we strongly recommend that it be installed alongside your
existing GNOME desktop, rather than replacing it. While there is a pretty nice
setup for downloading and installing the development version of GNOME,
discussion of that process is outside the scope of this document.
It might not be immediately obvious, but one can also type in Oriya at the yudit
command line at the bottom of the editor window, using the same keyboard
layout as in the main window. To do that, replace the line
yudit.command.font=default
with,
yudit.command.font=TrueType
in the yudit.properties configuration file.1 You can check that this works by doing a Find
(Alt-Q) which will position the cursor in the command area. Now, you can
switch between Oriya input (normally, F2) and English (straight) input
(normally, F1). You can search for Oriya text uing Find.
yudit also has a find-and-replace function, though it is not accessible via the
menu. Click in the command area, and type
replace old-text new-text
The old text is highlighted as it is found, and hitting <return> does the
replacement. A second <return> takes you to the next occurrence of the old
text, if any. Thus, you can fix a common misspelling in Oriya with, ``replace
r]p rZp''.
PO files typically start with several comment strings, followed by a header
entry, and then by entries to be translated. Each entry, including the header
entry consists of a msgid/ msgstr pair. The header entry is special, and is
described below (Sec. 4.2).
For all entries other than the header, the msgid line contains the
double-quoted English string to be translated, and the Oriya translation is to
be entered between the double quotes on the corresponding msgstr line. Both
msgid and msgstr strings may be split over several lines, but each line
needs to be double-quoted, and there should be no blank lines between multiple
lines that are part of the same string.
Each comment line starts with the `#' character, and there are two types of
comments: the first having some whitespace immediately after the `#', and the
second having some non-whitespace character after the `#'. The first kind are
added exclusively by the translator, while the second kind are typically
created and maintained by the GNU gettext utilities, though in some cases the
translator might need to modify them.
Comment lines beginning with `#,' have a special meaning, as they provide a
comma-separated list of flags that provide the translator with some additional
information. Two of the most sommonly encountered flags are `fuzzy' and
`c-format' and you might see them combined on a single line as ``#, fuzzy,
c-format''. The `fuzzy' flag can be generated by the gettext utilities, or
can be inserted by the translator, and serves as a warning that this entry
might not have been correctly translated. The `c-format' flag, or its
opposite, the `no-c-format' flag should only be added and modified
automatically by the gettext utilities. The `c-format' flag indicates that
the untranslated string is supposed to be a C format string for printf,
sprintf, etc., so that any components of the string such as
\n, %s, %d, %i, %ld, etc., should be retained as
is. Conversely, the `no-c-format' flag indicates that the entry is not a C
format string so that the % has no special meaning.
Though there are no absolute guidelines for what belongs in a PO file header,
we have decided to follow the convention adopted by the GNOME Translation
Project. Most of the information here comes from a discussion in July 2004 on
the GNOME internationalization mailing
list [12].
gettext provides a template for the header when it first generates a PO file,
which looks like
# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR THE PACKAGE'S COPYRIGHT HOLDER
# This file is distributed under the same license as the PACKAGE package.
# FIRST AUTHOR <EMAIL ADDRESS>, YEAR.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2004-07-21 05:06+0200\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL ADDRESS>\n"
"Language-Team: LANGUAGE <LL li org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=CHARSET\n"
"Content-Transfer-Encoding: 8bit\n"
"Plural-Forms: PLURALEXPRESSION\n"
All uppercase entried are supposed to be replaced by appropriate values. You
should also remember to remove the `fuzzy' flag from the header once you are
finished modifying it.
Each section of the header has the following meaning:
- # SOME DESCRIPTIVE TITLE: include a brief description of the
file. All Oriya PO files should read, ``Oriya translation of
gedit.HEAD.pot'', if localizing the development (HEAD) version of gedit for example. Else, use the version number, e.g., for GNOME 2.4,
``Oriya translation of gedit-2-4.pot''.
- # Copyright (C) YEAR THE PACKAGE'S COPYRIGHT HOLDER: use the current
year. If there is already an entry for a previous year, add this year with a
comma in front of it, e.g., ``2002, 2003, 2004''. As we are assigning
copyright to the Free Software Foundation, Inc., for us, such a line might
read ``Copyright (C) 2004, Free Software Foundation, Inc.'' You are, of
course, free to retain copyright, but should you choose to do so, please
inform the mailing list about it. Also, any legal issues over the copyright
are solely up to you to handle.
- # This file is distributed under the same license as the PACKAGE
package: Replace `PACKAGE' with the name of the program being localized,
e.g., if translating gedit, use `gedit'.
- # FIRST AUTHOR <EMAIL ADDRESS>, YEAR: list everyone who has
contributed separately on a line, with each person's individual email
address. Thus, if I started translating gedit in 2003, and S. Mohaptra
took it over in 2004, the list might read as:
# Gora Mohanty <gora_mohanty@yahoo.co.in>, 2003.
# Satya Mohapatra <satyanarayan_ray@yahoo.com>, 2004.
- Any further comments can be added below the FIRST AUTHOR line, such as
any peculiar terminology used, etc.
- #, fuzzy: this entry should be removed after the header has been
modified.
- ``Project-Id-Version: PACKAGE VERSION
\n'': replace PACKAGE VERSION with
a short string that uniquely identifies this file, e.g., ``gedit.HEAD.or''
- ``Report-Msgid-Bugs-To:
\n'': ignore this entry.
- ``POT-Creation-Date: 2004-07-15 05:06+0200
\n'': this should have been
filled in automatically by gettext, and you should not modify it.
- ``PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE
\n'': enter the date (using
this format) that the PO file was last edited. On a Linux system, the
correctly formatted date can be obtained by typing
[formatcom=\color{red}]
date +"%Y-%m-%d %H:%M%z"
- ``Last-Translator: FULL NAME <EMAIL ADDRESS
> \n'': enter full name and
email address of the person who last modified the file.
- ``Language-Team: LANGUAGE <LL li org
> \n'': enter name and email address
of the translation team. For us, this should read ``Language-Team: Oriya
<oriya-group@lists.sarovar.org>''.
- ``MIME-Version: 1.0
\n'': do not change this.
- ``Content-Type: text/plain; charset=CHARSET
\n'': for GNOME translations,
replace CHARSET with UTF-8. You should ensure that when saving from yudit or
any other editor that the file is indeed saved as UTF-8.
- ``Content-Transfer-Encoding: 8bit
\n'': do not change this.
- ``Plural-Forms: PLURALEXPRESSION
\n'': this need only be included in a PO
file which has plural forms in it. See the discussion of plural forms below,
or read the gettext documentation for details. For Oriya (and most other
languages), this entry should read ``Plural-Forms: nplurals=2; plural=( n !=
1 );
\n'', which says that there is a singular form and a plural form for
nouns, with the latter being used if the number of objects is other than 1.
Finally, here is a complete PO file header from localizing gedit.
# Oriya translation of gedit.HEAD.pot.
# This file is distributed under the same license as the gedit package.
# Copyright (C) 2004 Free Software Foundation, Inc.
# Gora Mohanty <gora_mohanty@yahoo.co.in>, 2004.
#
msgid ""
msgstr ""
"Project-Id-Version: gedit.HEAD.or\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2004-07-03 04:15+0200\n"
"PO-Revision-Date: 2004-07-22 17:20+0530\n"
"Last-Translator: Gora Mohanty <gora_mohanty@yahoo.co.in>\n"
"Language-Team: Oriya <oriya-group@lists.sarovar.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Plural-Forms: nplurals=2; plural=( n != 1 );\n"
Here is a list of some commonly encountered problems with the first few
translations that were reviewed:
- use active voice, and the most polite form of address. Thus, use
[commandchars=\\\{\}]
msgid ``Save File''
msgstr ``{\orx PA{i}l sMrxN kr{\char226}{\char93}}''
rather than any of
[commandchars=\\\{\}]
msgstr ``{\orx PA{i}l sMrxN kr{\char91}bA}''
msgstr ``{\orx PA{i}l sMrxN krAyAu}''
msgstr ``{\orx PA{i}l sMrxN kr}''
- match newline symbols in `c-format' entries. Else, errors will be
generated at the time of compiling the PO file. Thus,
[commandchars=\\\{\}]
#: gedit/gedit-document.c:1875
msgid ``''
``There is not enough disk space to save the file.\bsn''
``Please free some disk space and try again.\bsn''
msgstr ``''
``{\orx PA{i}l sMrxN pAi/ {\char14}{\char201}<r py{\char147}A{\char181} YAgA
nA{\char196}/}.\bsn''
``{\orx dYA kr{\char91} {\char141}C{\char91} YAgA KAl{\char91} kr{\char91}
p{\char93}N{\char91} <c{\char198}A kr{\char226}{\char93}}\bsn''
- hot keys are special key sequences that can be used to choose a menu
item from the keyboard. These are usually denoted by underscores in the
translations. Thus,
msgid ``_Save File'
indicates that the hot key Alt-S triggers the ``Save File'' action. The
hot-key part of the string should be translated as is. Thus, translate this as,
[commandchars=\\\{\}]
msgstr ``{\orx PA{i}l sMrxN kr{\char226}{\char93}} (_S)''
rather than either of
[commandchars=\\\{\}]
msgstr ``{\orx PA{i}l sMrxN kr{\char226}{\char93}}''
msgstr ``{\orx PA{i}l sMrxN kr{\char226}{\char93}} (\_{}{\orx es)}''
Do not try to replace _S with the equivalent Oriya character, as that will
be specific to a particular keyboard layout, and also will have no mnemonic
value.
- when a `c-format' flag is used for an entry, make sure that any
components of the string such as
\n, %s, %d, %i, %ld,
etc., are retained as is. For example,
[commandchars=\\\{\}]
#. Pixel size of image: width x height in pixel
#: libeog/eog-file-selection.c:240
#, c-format
msgid "\per{}s x \per{}s pixel"
msgstr "\per{}s x \per{}s {\orx p{\char91}<{\char203}l+}"
The order of the arguments is also important. For example, in a C program, one
might write
[commandchars=\\\{\}]
printf( ``Could not allocate \per{}d bytes to read file \"\per{}s\"'', nbytes, file );
to print a message about a memory allocation failure. Here, nbytes is a
variable holding the number of bytes that the program failed to allocate, and
file is the name of the file in question. The %d format specifier
applies to the nbytes argument and says to print it as an
integer. Likewise, the %s specifier applies to the file argument, which
is therefore printed as a string. When the printf format string is
extracted into a .po file, one might translate it in Oriya as follows:
[commandchars=\\\{\}]
msgid "Could not allocate \per{}d bytes to read file \"\per{}s\""
msgstr "{\orx PA{i}l} \"\per{}s\"{\orx {\char139} p{\char15}{bA} pA{i/}}"
"\per{}d {\orx bAiq+ bA{\char242}{\char91} <h{\char158}
nA{\char196}/}"
However, note that this changes the order of the format specifiers, so that
now the %s specifier applies to the nbytes argument, and the %d
specifier to the file argument. Thus, the correct way to do this is
as follows:
[commandchars=\\\{\}]
msgid "Could not allocate \per{}d bytes to read file \"\per{}s\""
msgstr "{\orx PA{i}l} \"\per{}2$s\"{\orx {\char139} p{\char15}{bA} pA{i/}}"
"\per{}1$d {\orx bAiq+ bA{\char242}{\char91} <h{\char158}
nA{\char196}/}"
gettext recognizes this special notation and rearranges the printf
arguments to match the format specifiers.
- mark a translated entry as `fuzzy' when you are not sure of the
translation, e.g., if there was a word that was not included in the
glossary, and you could not find an appropriate translation. Try to add a
comment about why the `fuzzy' flag was needed. Likewise, pay attention to
`fuzzy' flags added by gettext. These probably result from slight
modifications to previously translated strings due to program development,
so that the translation might have become subtly wrong. Do not go overboard
in adding `fuzzy' flags, as in that case someone else will have to
essentially retranslate the file. Normally, fuzzy entries should only be a
few percent of the total number of entries.
- the Oriya OpenType font still has a few rough edges that you will
need to work around:
- ``r mA±A'' + ``u r'' gets
converted into a
``PÐ'', e.g., in t+q[.
These should
be distinct, so continue typing it correctly. When the font is fixed,
these problems will be automatically taken care of.
- some conjuncts are missing, e.g., p+q, P+q. Continue typing
them correctly, even if they show up with a ``halant''. These will also be
corrected automatically when the font is fixed.
- sometimes a conjunct is formed even when you do not want one, e.g.,
with mAj[n+ r, because one is trying to type a
consonant with a
``halant'' followed by another consonant. Leave a space between the
character with the ``halant'' and the next one. Note that using backspace
to remove the extra blank space will not work, even though it might appear
to in yudit. The unwanted conjunct will be formed if the file is saved
and reopened.
- translator_credits entries: these offer you a chance to gain
recognition for your work. Instead of trying to translate these, simply
enter your name (in Oriya, if you wish) and your email address. Thus,
[commandchars=\\\{\}]
msgid ``translator_credits''
msgstr ``{\orx <gArA mAhA{\char226}{\char91}} <gora_mohanty@yahoo.co.in>''
or,
[commandchars=\\\{\}]
msgid ``translator_credits-PLEASE_ADD_YOURSELF_HERE''
msgstr ``{\orx <gArA mAhA{\char226}{\char91}} <gora_mohanty@yahoo.co.in>''
- Occasionally, you will come across sub-strings like CHECK_PATTERN, or
GTK_WRAP_CHAR in the strings to be translated. These probably represent
symbolic constants and should be left as is in the translations, i.e.,
translate the rest of the string, leaving these in English. Sometimes it might
be difficult to decide whether a symbolic constant is intended. A general rule
of thumb is that any combination of capital letters with one or more
underscores is probably a symbolic constant
- Plural forms need special handling, where we have to distinguish between
singular and plural forms for nouns. Thus,
#: gedit/gedit-file.c:1365
#, c-format
msgid "Loaded %d file"
msgid_plural "Loaded %d files"
should be translated as,
[commandchars=\\\{\}]
msgstr[0] \per{}d ``{\orx PA{i}l DArN krA{yA}iC{\char91}}''
msgstr[1] \per{}d ``{\orx PA{i}lg{\char93}{\char14}k DArN
krA{yA}iC{\char91}}''
If there are plural forms in a PO file, a ``Plural-Forms:'' comment needs to
be added to the PO file header (see Sec. 4.2).
- entries that are commented out with `#~' are old ones that have been
made obsolete due to further development of the program. gettext has tagged
them, but has left them in because there is a possibility that they might be
brought back into the program. You should completely ignore these. You need
not translate them, nor should you delete them.
- you should strictly follow the glossary. In very rare cases, if you feel
that the glossary is wrong, discuss your proposed correction on the mailing
list before going ahead with it. At the very least, flag the entry as `fuzzy'
and add a comment as to why you chose to do what you did. If the glossary has
no translation for a given word, go through a dictionary and coin your own,
but again try to discuss these and flag such translations. You should also
separately make a note of new words that belong in the glossary, and forward
them to us.
- after finishing the translation, please take some additional time to go
over the file one more time, checking for readability and looking for errors
in grammar, spelling, etc. Ideally, this should be done along with a second
person who did not take part in the translation and can thus be an impartial
reviewer.
- 1
-
An Oriya font developed by Rajesh Pradhan and Andy White,
http://oriya.sarovar.org/downloads/utkalm.ttf.gz,
The file is gzipped.
- 2
-
The yudit Unicode text editor,
http://www.yudit.org.
- 3
-
G. Mohanty,
A practical primer for using Oriya under Linux, v0.3,
http://oriya.sarovar.org/docs/getting_started/index.html, 2004.
- 4
-
An English to Oriya glossary of commonly used terms in the Rebati project,
http://oriya.sarovar.org/download/rebati-glossary.pdf.gz.
- 5
-
The Rebati project mailing list,
http://lists.sarovar.org/cgi-bin/mailman/listinfo/oriya-group,
Subscribe at the above URL. Due to a bug in the sarovar.org setup,
please use only the web interface to reply to the confirmation request sent
to you after you ask to subscribe.
- 6
-
The archives of the mailing list for the Rebati project,
http://lists.sarovar.org/pipermail/oriya-group,
Archives are publicly available, i.e., you do not have to subscribe
to the list in order to read the archives. Currently, the archives are
updated daily, but that will probably change to a monthly update once the
volume of messages go up.
- 7
-
G. Mohanty,
A tutorial on Native Language Support using GNU gettext, v0.2,
http://oriya.sarovar.org/docs/gettext/index.html, 2004.
- 8
-
G. Mohanty,
A plan for Oriya localization, v0.1,
http://oriya.sarovar.org/docs/translation_plan/index.html,
2004.
- 9
-
A simplified translators HowTo from the IndLinux project Wiki,
http://www.indlinux.org/wiki/index.php/TranslatorsHowto.
- 10
-
A translators guide from the Ankur (Bengali Linux) project,
http://tldp.org/HOWTO/Bangla-HOWTO/devguide.html#transguide.
- 11
-
T. Mishra, J. B. Mohanty, B. Nanda, and S. K. Mund, editors,
Bureau's English-Oriya dictionary,
Orissa State Bureau of Text Book Preparation and Production, 2000.
- 12
-
The GNOME internationalization mailing list archives,
http://mail.gnome.org/archives/gnome-i18n/.
Translating PO files HOWTO
This document was generated using the
LaTeX2HTML translator Version 2002-2-1 (1.70)
Copyright © 1993, 1994, 1995, 1996,
Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999,
Ross Moore,
Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html -no_math -html_version 4.0,math,unicode,i18n,tables -mkdir -dir memo -split 0 memo
The translation was initiated by root on 2004-08-18
Footnotes
- ... file.1
- Under Linux, this
file is in ~/.yudit for the individual user, and the system file is in
/usr/share/yudit/config. For Windows, the corresponding locations are
C:
\HOME
\.yudit and C:
\Program
Files
\Yudit
\config, assuming standard installations
for both operating systems.
root
2004-08-18