Extracting translatable strings

Now, it is time to extract the strings to be translated from the program source code. This is achieved with xgettext, which can be invoked as follows:

  xgettext -d hello -s -o hello.pot hello.c
This processes the source code in hello.c, saving the output in hello.pot (the argument to the -o option). The -s option tells xgettext to produce sorted output. The message domain for the program should be specified as the argument to the -d option, and should match the domain specified in the call to textdomain (on line 9 of the program source). Other details on how to use gettext can be found from ``man gettext.''

A .pot (portable object template) file is used as the basis for translating program messages into any language. To start translation, one can simply copy hello.pot to oriya.po (this preserves the template file for later translation into a different language). However, the preferred way to do this is by use of the msginit program, which takes care of correctly setting up some default values,


  msginit -l or_IN -o oriya.po -i hello.pot
Here, the -l option defines the locale (an Oriya locale should have been installed on your system), and the -i and -o options define the input and output files, respectively. If there is only a single .pot file in the directory, it will be used as the input file, and the -i option can be omitted. For me, the oriya.po file produced by msginit would look like:
  # Oriya translations for PACKAGE package.
  # Copyright (C) 2004 THE PACKAGE'S COPYRIGHT HOLDER
  # This file is distributed under the same license as the PACKAGE package.
  # Gora Mohanty <gora_mohanty@yahoo.co.in>, 2004.
  #
  msgid ""
  msgstr ""
  "Project-Id-Version: PACKAGE VERSION\n"
  "Report-Msgid-Bugs-To: \n"
  "POT-Creation-Date: 2004-06-22 02:22+0530\n"
  "PO-Revision-Date: 2004-06-22 02:38+0530\n"
  "Last-Translator: Gora Mohanty <gora_mohanty@yahoo.co.in>\n"
  "Language-Team: Oriya\n"
  "MIME-Version: 1.0\n"
  "Content-Type: text/plain; charset=UTF-8\n"
  "Content-Transfer-Encoding: 8bit\n"
 
  #: hello.c:10
  msgid "Hello, world!\n"
  msgstr ""
msginit prompted for my email address, and probably obtained my real name from the system password file. It also filled in values such as the revision date, language, character set, presumably using information from the or_IN locale.

It is important to respect the format of the entries in the .po (portable object) file. Each entry has the following structure:

  WHITE-SPACE
  #  TRANSLATOR-COMMENTS
  #. AUTOMATIC-COMMENTS
  #: REFERENCE...
  #, FLAG...
  msgid UNTRANSLATED-STRING
  msgstr TRANSLATED-STRING
where, the initial white-space (spaces, tabs, newlines,...), and all comments might or might not exist for a particular entry. Comment lines start with a '#' as the first character, and there are two kinds: (i) manually added translator comments, that have some white-space immediately following the '#,' and (ii) automatic comments added and maintained by the gettext tools, with a non-white-space character after the '#.' The msgid line contains the untranslated (English) string, if there is one for that PO file entry, and the msgstr line is where the translated string is to be entered. More on this later. For details on the format of PO files see gettext::Basics::PO Files:: in the Emacs info-browser (see Appdx. A for an introduction to using the info-browser in Emacs).
Gora Mohanty 2004-07-24