OpenBSD manual page server

Manual Page Search Parameters

SETLOCALE(3) Library Functions Manual SETLOCALE(3)

setlocaleselect character encoding

#include <locale.h>

char *
setlocale(int category, const char *locale);

The () function sets and retrieves the active locale for the current process. The locale modifies the behaviour of some functions in the C library with respect to the character encoding, and on other operating systems also with respect to some language and cultural conventions. For more information about locales in general, see the locale(1) manual page.

On OpenBSD, the only useful value for the category is LC_CTYPE. It sets the locale used for character encoding, character classification, and case conversion. For compatibility with natural language support in packages(7), all other categories — LC_COLLATE, LC_MESSAGES, LC_MONETARY, LC_NUMERIC, and LC_TIME — can be set and retrieved, too, but their values are ignored by the OpenBSD C library. A category of LC_ALL sets the entire locale generically, which is strongly discouraged for security reasons in portable programs.

The syntax and semantics of the locale argument are not standardized and vary among operating systems. On OpenBSD, if the locale string ends with ".UTF-8", the UTF-8 locale is selected; otherwise, the "C" locale is selected, which uses the ASCII character set. If the locale contains a dot but does not end with ".UTF-8", () fails.

If locale is an empty string (""), the value of the environment variable LC_ALL, with a fallback to the variable corresponding to category, and with a further fallback to LANG, is used instead, as documented in the locale(1) manual page.

If locale is NULL, the locale remains unchanged. This can be used to determine the currently active locale.

By default, C programs start in the "C" locale. The only function in the library that sets the locale is (); the locale is never changed as a side effect of some other routine.

The LC_CTYPE category modifies the behaviour of at least the following functions: iswctype(3), mblen(3), mbrlen(3), mbrtowc(3), mbsrtowcs(3), mbstowcs(3), mbtowc(3), towctrans(3), towlower(3), towupper(3), wcrtomb(3), wcscasecmp(3), wcsrtombs(3), wcstombs(3), wctomb(3), wctrans(3), wctype(3), and the functions documented in iswalnum(3).

In case of success, setlocale() returns a pointer to a static string describing the locale that is in force after the call. Subsequent calls to setlocale() may change the content of the string. The format of the string is not standardized and varies among operating systems.

On OpenBSD, if setlocale() was never called with a non-NULL locale argument, the string "C" is returned. Otherwise, if the category was not LC_ALL or if the locale is the same for all categories, a copy of the locale argument is returned. Otherwise, the locales for the six categories LC_COLLATE, LC_CTYPE, LC_MESSAGES, LC_MONETARY, LC_NUMERIC, LC_TIME are concatenated in that order, with slash (‘/’) characters in between.

In case of failure, setlocale() returns NULL. On OpenBSD, that can only happen if the category is invalid, if a character encoding other than UTF-8 is requested, if the requested locale name is of excessive length, or if memory allocation fails.

Calling

setlocale(LC_CTYPE, "en_US.UTF-8");

at the beginning of a program selects the UTF-8 locale and returns "en_US.UTF-8". Calling

setlocale(LC_ALL, NULL);

right afterwards leaves the locale unchanged and returns "C/en_US.UTF-8/C/C/C/C".

locale(1), newlocale(3), nl_langinfo(3), uselocale(3)

The setlocale() function conforms to ANSI X3.159-1989 (“ANSI C89”).

The setlocale() function first appeared in 4.3BSD-Net/2.

On systems other than OpenBSD, calling setlocale() or uselocale(3) with a category other than LC_CTYPE can cause erratic behaviour of many library functions. For security reasons, make sure that portable programs only use LC_CTYPE.

For example, the following functions may be affected. The list is probably incomplete. For example, additional library functions may be impacted if they directly or indirectly call affected functions, or if they attempt to imitate aspects of their behaviour. Functions that are not standardized may be affected too.

glob(3), strcoll(3), strxfrm(3), wcscoll(3), wcsxfrm(3), and the functions documented in regexec(3)
catgets(3), catopen(3), nl_langinfo(3), perror(3), psignal(3), strerror(3), strsignal(3), and the functions documented in err(3)
localeconv(3), nl_langinfo(3), strfmon()
atof(3), localeconv(3), nl_langinfo(3), strfmon(), and the functions documented in printf(3), scanf(3), strtod(3), wcstod(3), wprintf(3), wscanf(3). This category is particularly dangerous because it can cause bugs in the parsing and formatting of numbers, for example failures to recognize or properly write decimal points.
getdate(), nl_langinfo(3), strftime(3), strptime(3). Similarly, this is prone to causing bugs in the parsing and formatting of date strings.
On systems other than OpenBSD, this category may affect the behaviour of additional functions, for example: btowc(3), isalnum(3), isalpha(3), isblank(3), iscntrl(3), isdigit(3), isgraph(3), islower(3), isprint(3), ispunct(3), isspace(3), isupper(3), isxdigit(3), mbsinit(3), strcasecmp(3), strcoll(3), strxfrm(3), tolower(3), toupper(3), vis(3), wcscoll(3), wcsxfrm(3), wctob(3)
August 4, 2022 OpenBSD-current