6.4. I18N

6.4.1. About

This document describes the Native Language Support (NLS) in Invenio.

6.4.2. Native Language Support information for administrators

Invenio is currently available in the following languages:

af = Afrikaans
ar = Arabic
bg = Bulgarian
ca = Catalan
cs = Czech
de = German
el = Greek
en = English
es = Spanish
fa = Persian (Farsi)
fr = French
gl = Galician
hr = Croatian
hu = Hungarian
it = Italian
ja = Japanese
ka = Georgian
lt = Lithuanian
no = Norwegian (Bokmål)
pl = Polish
pt = Portuguese
ro = Romanian
ru = Russian
rw = Kinyarwanda
sk = Slovak
sv = Swedish
uk = Ukrainian
zh_CN = Chinese (China)
zh_TW = Chinese (Taiwan)

If you are installing Invenio and you want to enable/disable some languages, please just follow the standard installation procedure as described in the INSTALL file. The default language of the installation as well as the list of all user-seen languages can be selected in the general invenio.conf file, see variables CFG_SITE_LANG and CFG_SITE_LANGS.

(Please note that some runtime Invenio daemons – such as webcoll, responsible for updating the collection cache, running every hour or so – may work twice as long when twice as many user-seen languages are selected, because it creates collection cache page elements for every user-seen language. Therefore, if you have defined thousands of collections and if you find the webcoll speed to be slow in your setup, you may want to try to limit the list of selected languages.)

6.4.3. Native Language Support information for translators

If you want to contibute a translation to Invenio, then please follow the procedure below:

  • Please check out the existence of po/LL.po file for your language,
    where LL stands for the ISO 639 language code (e.g. el for Greek). If such a file exists, then this language is already supported, in which case you may want to review the existing translation (see below). If the file does not exist yet, then you can create an empty one by copying the invenio.pot template file into LL.po that you can review as described in the next item. (Please note that you would have to translate some dynamic elements that are currently not located in the PO file, see the appendix A below.)
  • Please edit LL.po to review existing translation. The PO file format is a standard GNU gettext one and so you can take advantage of dedicated editing modes of programs such as GNU Emacs, KBabel, or poEdit to edit it. Pay special attention to strings marked as fuzzy and untranslated. (E.g. in the Emacs PO mode, press f and u to find them.) Do not forget to remove fuzzy marks for reviewed translations. (E.g. in the Emacs PO mode, press TAB to remove fuzzy status of a string.)

  • After you are done with translations, please validate your file to make sure it does not contain formatting errors. (E.g. in the Emacs PO mode, press V to validate the file.)

  • If you have access to a test installation of Invenio, you may want to see your modified PO file in action:

    $ cd po
    $ emacs ja.po                      # edit Japanese translation
    $ make update-gmo
    $ make install
    $ sudo apachectl restart
    $ firefox http://your.site/?ln=ja  # check it out in context

    If you do not have access to a test installation, please contribute your PO file to the developers team (see the next step) and we shall install it on a test site and contact you so that you will be able to check your translation in the global context of the application.

    (Note to developers: note that make update-gmo command may be necessary to run before make if the latter fails, even if you are not touching translation business at all. The reason being that the gmo files are not stored in CVS, while they are included in the distribution tarball. So, if you are building from CVS, and you do not have them in your tree, you may get build errors in directories like modules/webhelp/web/admin saying things like “No rule to make target index.bg.html”. The solution is to run make update-gmo to produce the gmo files before running make. End of note to developers.)

  • Please contribute your translation by emailing the file to <info@inveniosoftware.org>. You help is greatly appreciated and will be properly credited in the THANKS file.

See also the GNU gettext manual, especially the chapters 5, 6 and 11. <http://www.gnu.org/software/gettext/manual/html_chapter/gettext_toc.html>

6.4.4. Native Language Support information for programmers

Invenio uses standard GNU gettext I18N and L12N philosophy.

In Python programs, all output strings should be made translatable via the _() convention:

from messages import gettext_set_language
def square(x, ln=CFG_SITE_LANG):
    _ = gettext_set_language(ln)
    print _("Hello there!")
    print _("The square of %s is %s.") % (x, x*x)

In webdoc source files, the convention is _()_:

_(Search Help)_

Here are some tips for writing easily translatable output messages:

  • Do not cut big phrases into several pieces, the meaning may be harder to grasp and to render properly in another language. Leave them in the context. Do not try to economize and reuse standalone-translated words as parts of bigger sentences. The translation could differ due to gender, for example. Rather define two sentences instead:

    not: _("This %s is not available.") % x,
         where x is either _("basket") or _("alert")
    but: _("This basket is not available.") and
         _("This alert is not available.")
  • If you print some value in a translatable phrase, you can use an unnamed %i or %s string replacement placeholders:

    yes: _("There are %i baskets.") % nb_baskets

    But, as soon as you are printing more than one value, you should use named string placeholders because in some languages the parts of the sentence may be reversed when translated:

    not: _("There are %i baskets shared by %i groups.") % \
            (nb_baskets, nb_groups)
    but: _("There are %(x_nb_baskets)s baskets shared by %(x_nb_groups)s groups.") % \
            {'x_nb_baskets': nb_baskets, 'x_nb_groups': nb_groups,}

    Please use the x_ prefix for the named placeholder variables to ease the localization task of the translator.

  • Do not mix HTML presentation inside phrases. If you want to reserve space for HTML markup, please use generic replacement placeholders as prologue and epilogue:

    not: _("This is <b>cold</b>.")
    but: _("This is %(x_fmt_open)scold%(x_fmt_close)s.")

    Ditto for links:

    not: _("This is <a href="%s">homepage</a>.")
    not: _("This is %(x_url_open)shomepage%(x_url_close)s.")
  • Do not leave unnecessary things in short commonly used translatable expressions, such as extraneous spaces or colons before or after them. Rather put them in the business logic:

    not: _(" subject")
    but: " " + _("subject")
    not: _("Record %i:")
    but: _("Record") + "%i:" % recID

    On the other hand, in long sentences when the trailing punctuation has its meaning as an integral part of the label to be shown on the interface, you should leave them:

    not: _("Nearest terms in any collection are")
    but: _("Nearest terms in any collection are:")
  • Last but not least: the best is to follow the style of existing messages as a model, so that the translators are presented with a homogeneous and consistently presented output phrase set.

6.4.5. Introducing a new language

If you are introducing a new language for the first time, then please firstly create and edit the PO file as described above in Section 2. This will make the largest portion of the translating work done, but it is not fully enough, because we currently have also to translate some dynamic elements that aren’t located in PO files.

The development team can edit the respective files ourself, if the translator sends over the following translations by email:

  • demo server name, from invenio.conf:

    Atlantis Institute of Fictive Science
  • demo collection names, from democfgdata.sql:

    CERN Divisions
    CERN Experiments
    Theoretical Physics (TH)
    Experimental Physics (EP)
    Articles & Preprints
    Books & Reports
    Multimedia & Arts
    Atlantis Times News
    Atlantis Times Arts
    Atlantis Times Science
    Atlantis Times
    Atlantis Institute Books
    Atlantis Institute Articles
    Atlantis Times Drafts
    ALEPH Papers
    ALEPH Internal Notes
    ALEPH Theses
    ISOLDE Papers
    ISOLDE Internal Notes
  • demo right-hand-side portalbox, from democfgdata.sql:

    Welcome to the demo site of the Invenio, a free document server
    software coming from CERN. Please feel free to explore all the
    features of this demo site to the full.

The development team will than edit various files (po/LINGUAS, config files, sql files, plenty of Makefile files, etc) as needed.

The last phase of the initial introduction of the new language would be to translate some short static HTML pages such as:

  • modules/webhelp/web/help-central.webdoc

Thanks for helping us to internationalize Invenio.

6.4.6. Integrating translation contributions

This appendix contains some tips on integrating translated phrases that were prepared for different Invenio releases. It is mostly of interest to Invenio developers or the release manager.

Imagine that we have a working translation file sk.po and that we have received a contribution co-CONTRIB.po that was prepared for previous Invenio release, so that the messages do not fully correspond. Moreover, another person might have had worked with the sk.po file in the meantime. The goal is to integrate the contributions.

Firstly, check whether the contributed file sk-CONTRIB.po was indeed prepared for different software release version:

$ msgcmp --use-fuzzy --use-untranslated sk-CONTRIB.po invenio.pot

If yes, then join its translations with the ones in the latest sk.po file:

$ msgcat sk-CONTRIB.po sk.po > sk-TMP.po

and update the message references:

$ msgmerge sk-TMP.po invenio.pot > sk-NEW.po

This will give the new file sk-NEW.po that should now be msgcmp’rable to invenio.pot.

Lastly, we will have to go manually through sk-NEW.po in order to resolve potential translation conflicts (marked via #-#-#-#-# fuzzy translations). If the conflicts are evident and easy to resolve, for example corrected typos, we can fix them. If the conflicts are of translational nature and cannot be resolved without consulting the translators, we should warn them about the conflicts. After the evident conflicts are resolved and the file validates okay, we can rename it to sk.po and we are done.

(Note that we could have used --use-first option to msgcat if we were fully sure that the first translation file (sk-CONTRIB) could have been preferred as far as the quality of translation goes.)