Localization

Status of Localization

Localization currently lacks tooling to extract the localization tags mentioned in this document and there is no plan to implement this tooling in the immediate future. If you would like to integrate localization in a project, the guidance in this document should be followed and the localization tooling will need to be written; there is a prototype implementation here https://gitlab-master.nvidia.com/carbon/Carbonite-externals/localization-tools but the code parsing implementation needs to be rewritten using clang’s parser API, so that it can reliably parse code.

Localization Overview

When writing code that handles text that will be seen by an end-user, the Carbonite localization system (l10n::IL10n) must be used. In C++ code, this system must be used through the CARB_LOCALIZE() macro. For Python code, bindings are provided which do the equivalent of CARB_LOCALIZE().

Usage of CARB_LOCALIZE()

It is required that the full string literal be passed into the CARB_LOCALIZE(), rather than storing it in a separate variable. This is done to allow the strings to be hashed at compile time, as well as to make it easy to track which strings are used in UI code, so these can easily be extracted from the code to compile a list of strings to send to a translation company.

void uiCode(const char *arg)
{
    otherUiCode(CARB_LOCALIZE(arg)); // very bad

    const char *const kUiString = "Operation Failed successfully";
    otherUiCode(CARB_LOCALIZE(kUiString)); // bad

    otherUiCode(CARB_LOCALIZE("Operation Failed successfully")); // good
}

The returned value from CARB_LOCALIZE() should never be cached; if system settings are changed, the returned string from CARB_LOCALIZE() may change, so caching values may lead to stale strings showing on the UI.

Do not use line continuators inside of strings. These will look strange when they show up in the table of translation entries sent to the translator. Overly long strings should be broken up using string concatenation (e.g. const char s[] = "concat" "enation";).

Avoid using hex escapes in strings when possible, since they can be troublesome; for example "\x7f" "delete" would show up in the translation file as "\x7fdelete", which could cause parsing errors later on.

Usage of CARB_LOCALIZE() in scripts

The script bindings offer a function named get_localized_string(), which has behavior that is identical to the CARB_LOCALIZE() macro. Because scripts are not compiled, this function will hash the entire string on every call. If repeatedly hashing strings performs too poorly, the hash can be calculated ahead of time using get_hash_from_key_string(); then, the localized version of the hash can be looked up with get_localized_string_from_hash().

get_localized_string_from_hash() does take an additional string parameter, which will be returned if that localization entry is missing. It is important that this string parameter still be the key string, so that end users will still see readable US English text in the case where the localization system broke.

Avoid the usage of raw strings in python for localization strings. The file sent to the translators will not show that the strings are raw strings, so the translators may be confused.

It is required to use the carb.l10n namespace when calling these functions (e.g. carb.l10n.get_localized_string("string")), so that tooling can extract localization keystrings.

Localization Key Strings

The strings passed into CARB_LOCALIZE() are referred to as key strings. The key strings are separate from the US English translation to allow the US English string to change without having to modify all occurrences of that key string in the code.

Key strings should be identical to their US English translation, if possible, because this will make the UI code easiest to read. This will also make the UI usable (although likely odd looking), if the localization system was unable to load. Additionally, using shortened versions of the US English translations as key strings could result in unintended collisions, so that should be avoided entirely.

In the case where one US English string has multiple translations in another language depending on the context, the key strings for each translation should have a suffix added that disambiguates it.

Formatted Localized Strings

UI strings that need formatted information in them should use the C++20 std::format style string formatting. This functionality is available in the ‘fmt’ package, which is available through packman. This is the best formatting style in this case because it does not have security issues if incorrect format strings are used.

If multiple formatted arguments are given in a format string, the format string should always use positional arguments. This should make the process of changing the format order more obvious to translators. For example "you were going {} km/h in a {} km/h zone" should instead be written as "you were going {0} km/h in a {1} km/h zone".

Management of Localized Strings

New localized strings in code must be wrapped in a CARB_LOCALIZE() macro. The localization tool (localize.sh/localize.bat) will run through the codebase and extract all of the localized strings into the localization data file (localization.csv). It is the responsibility of the developers to ensure that all strings shown on the UI are wrapped with these macros or script bindings.

the localization tool is a planned feature