locale

5:24:00 PM 0 Comments

locale



In computing, locale is
a set of parameters that defines
the user's language, country and any special variant preferences that the user
wants to see in their user
interface
. Usually a locale identifier consists of at least a language
identifier and a region identifier.


Locale identifiers can be defined in several ways:


On Unix, Linux and other POSIX-type platforms, they are defined similar to the RFC 3066 definition, but the
locale variant modifier is defined differently, and the charset is
included as a part of the identifier. It is defined in this format:
[language[_territory][.charset][@modifier]]

LC_CTYPE
Character classification and case conversion.
LC_COLLATE
Collation order.
LC_MONETARY
Monetary formatting.
LC_NUMERIC
Numeric, non-monetary formatting.
LC_TIME
Date and time formats.
LC_MESSAGES
Formats of informative and diagnostic messages and interactive responses.





Internationalization Variables



This section describes environment variables that are relevant to the operation of internationalized interfaces described in
IEEE Std 1003.1-2001.



Users may use the following environment variables to announce specific localization requirements to applications. Applications
can retrieve this information using the setlocale()
function to initialize the
correct behavior of the internationalized interfaces. The descriptions
of the internationalization environment variables describe
the resulting behavior only when the application locale is initialized
in this way. The use of the internationalization variables
by utilities described in the Shell and Utilities volume of
IEEE Std 1003.1-2001 is described in the ENVIRONMENT
VARIABLES section for those utilities in addition to the global effects
described in this section.



LANGThis variable shall determine the locale category for native language, local customs, and coded character set in the absence of
the LC_ALL and other LC_* ( LC_COLLATE , LC_CTYPE , LC_MESSAGES , LC_MONETARY ,
LC_NUMERIC , LC_TIME ) environment variables. This can be used by applications to determine the language to use for
error messages and instructions, collating sequences, date formats, and so on.LC_ALLThis variable shall determine the values for all locale categories. The value of the LC_ALL environment variable has
precedence over any of the other environment variables starting with LC_ ( LC_COLLATE , LC_CTYPE ,
LC_MESSAGES , LC_MONETARY , LC_NUMERIC , LC_TIME ) and the LANG environment variable.LC_COLLATEThis variable shall determine the locale category for character collation. It determines collation information for regular
expressions and sorting, including equivalence classes and multi-character collating elements, in various utilities and the strcoll() and strxfrm() functions.
Additional semantics of this variable, if any, are implementation-defined.LC_CTYPEThis variable shall determine the locale category for character handling functions, such as tolower(), toupper(), and isalpha().
This environment variable determines the interpretation of sequences of
bytes of
text data as characters (for example, single as opposed to multi-byte
characters), the classification of characters (for example,
alpha, digit, graph), and the behavior of character classes. Additional
semantics of this variable, if any, are
implementation-defined.LC_MESSAGESThis variable shall determine the locale category for processing affirmative and negative responses and the language and
cultural conventions in which messages should be written. [XSI] [Option Start]  It also affects the behavior of the catopen() function in determining the message catalog. [Option End] Additional semantics of this variable, if any, are implementation-defined. The language and cultural
conventions of diagnostic and informative messages whose format is unspecified by IEEE Std 1003.1-2001 should be affected
by the setting of LC_MESSAGES .LC_MONETARYThis variable shall determine the locale category for monetary-related numeric formatting information. Additional semantics of
this variable, if any, are implementation-defined.LC_NUMERICThis variable shall determine the locale category for numeric formatting (for example, thousands separator and radix character)
information in various utilities as well as the formatted I/O operations in printf()
and scanf() and the string conversion functions in strtod(). Additional semantics of this variable, if any, are implementation-defined.LC_TIMEThis variable shall determine the locale category for date and time formatting information. It affects the behavior of the time
functions in strftime(). Additional semantics of this variable, if any, are
implementation-defined.NLSPATH[XSI] [Option Start]
This variable shall contain a sequence of templates that the catopen()
function uses
when attempting to locate message catalogs. Each template consists of
an optional prefix, one or more conversion specifications, a
filename, and an optional suffix.

For example:




NLSPATH="/system/nlslib/%N.cat"




defines that catopen() should look for all message catalogs in the directory
/system/nlslib, where the catalog name should be constructed from the name parameter passed to catopen() ( %N ), with the suffix .cat.



Conversion specifications consist of a '%' symbol, followed by a single-letter keyword. The following keywords are
currently defined:



%NThe value of the name parameter passed to catopen().%LThe value of the LC_MESSAGES category.%lThe language element from the LC_MESSAGES category.%tThe territory element from the LC_MESSAGES category.%cThe codeset element from the LC_MESSAGES category.%%A single '%' character.

An empty string is substituted if the specified value is not currently defined. The separators underscore ( '_' ) and
period ( '.' ) are not included in the %t and %c conversion specifications.



Templates defined in NLSPATH are separated by colons ( ':' ). A leading or two adjacent colons "::" is
equivalent to specifying %N. For example:




NLSPATH=":%N.cat:/nlslib/%L/%N.cat"




indicates to catopen() that it should look for the requested message catalog in
name, name.cat, and /nlslib/category/name.cat, where category is the
value of the LC_MESSAGES category of the current locale.



Users should not set the NLSPATH variable unless they have a specific reason to override the default system path. Setting
NLSPATH to override the default system path produces undefined results in the standard utilities and in applications with
appropriate privileges. [Option End]




The environment variables LANG , LC_ALL , LC_COLLATE , LC_CTYPE , LC_MESSAGES ,
LC_MONETARY , LC_NUMERIC , LC_TIME , [XSI] [Option Start]  and NLSPATH [Option End]
 provide for the support of internationalized applications. The
standard utilities shall make use
of these environment variables as described in this section and the
individual ENVIRONMENT VARIABLES sections for the utilities. If
these variables specify locale categories that are not based upon the
same underlying codeset, the results are unspecified.



The values of locale categories shall be determined by a precedence order; the first condition met below determines the
value:




  1. If the LC_ALL environment variable is defined and is not null, the value of LC_ALL shall be used.



  2. If the LC_* environment variable ( LC_COLLATE , LC_CTYPE , LC_MESSAGES , LC_MONETARY ,
    LC_NUMERIC , LC_TIME ) is defined and is not null, the value of the environment variable shall be used to initialize
    the category that corresponds to the environment variable.



  3. If the LANG environment variable is defined and is not null, the value of the LANG environment variable shall be
    used.



  4. If the LANG environment variable is not set or is set to the empty string, the implementation-defined default locale
    shall be used.




If the locale value is "C" or "POSIX", the POSIX locale shall be used and the standard utilities behave in
accordance with the rules in POSIX Locale for the associated category.



If the locale value begins with a slash, it shall be interpreted as the pathname of a file that was created in the output format
used by the localedef utility; see OUTPUT FILES under localedef. Referencing such a pathname shall result in that locale being used for the
indicated category.



[XSI] [Option Start]
If the locale value has the form:




language[_territory][.codeset]




it refers to an implementation-provided locale, where settings of language, territory, and codeset are
implementation-defined.



LC_COLLATE , LC_CTYPE , LC_MESSAGES , LC_MONETARY , LC_NUMERIC , and LC_TIME are
defined to accept an additional field @ modifier,
which allows the user to select a specific instance of localization
data
within a single category (for example, for selecting the dictionary as
opposed to the character ordering of data). The syntax for
these environment variables is thus defined as:




[language[_territory][.codeset][@modifier]]




For example, if a user wanted to interact with the system in French, but required to sort German text files, LANG and
LC_COLLATE could be defined as:




LANG=Fr_FR
LC_COLLATE=De_DE




This could be extended to select dictionary collation (say) by use of the @ modifier field; for example:




LC_COLLATE=De_DE@dict




[Option End]

An implementation may support other formats.



If the locale value is not recognized by the implementation, the behavior is unspecified.



At runtime, these values are bound to a program's locale by calling the setlocale() function.



Additional criteria for determining a valid locale name are implementation-defined.








关于locale的设定,为什么要设定locale
关于locale的设定


locale是国际化与本土化过程中的一个非常重要的概念,个人认为,对于中文用户来说,通常会涉及到的国际化或者本土化,大致包含三个方面:看中文,写中文,与window中文系统的兼容和通信。从实际经验上看来,locale的设定与看中文关系不大,但是与写中文,及window分区的挂载方式有很密切的关系。本人认为就像一个纯英文的Windows能够浏览中文,日文或者意大利文网页一样,你不需要设定locale就可以看中文。那么,为什么要设定
locale呢?什么时候会用到locale呢?

Tags: locale 设定 原因 解释

一、为什么要设定locale

正如前面我所讲的,设定locale与你能否浏览中文的网页没有直接的关系,即便你把locale设置成en_US.ISO-8859-1这样一个标准的英文locale你照样可以浏览中文的网页,只要你的系统里面有相应的字符集(这个都不一定需要)和合适的字体(如simsun),浏览器就可以把网页翻译成中文给你看。具体的过程是网络把网页传送到你的机器上之后,浏览器会判断相应的编码的字符集,根据网页采用的字符集,去字体库里面找合适的字体,然后由文字渲染工具把相应的文字在屏幕上显示出来。


在下文本人会偶尔把字符集比喻成密码本,个人觉得对于一些东西比较容易理解,假如你不习惯的话,把全文copy到任何文本编辑器,用字符集替换密码本即可。


那有时候网页显示乱码或者都是方框是怎么回事呢?个人认为,显示乱码是因为设定的字符集不对(或者没有相应的字符集),例如网页是用UTF-8编码的,你非要用GB2312去看,而系统根据GB2312去找字体,然后在屏幕上显示,当然是一堆的乱码,也就是说你用一个错误的密码本去翻译发给你的电报,当然内容那叫一个乱;至于有些时候浏览的网页能显示一部分汉字,但有很多的地方是方框,能够显示汉字说明浏览器已经正确的判断出了网页的编码,并在字体库里面找到了相应的文字,但是并不是每个字体库都包含某个字符集全部的字体的缘故,有些时候会显示不完全,找一个比较全的支持较多字符集的字体就可以了。



既然我能够浏览中文网页,那为什么我还要设定locale呢?


其实你有没有想过这么一个问题,为什么gentoo官方论坛上中文论坛的网页是用UTF-8编码的(虽然大家一直强烈建议用GB2312编码),但是新浪网就是用GB2312编码的呢?而Xorg的官方网页竟然是ISO-8859-15编码的,我没有设定这个locale怎么一样的能浏览呢?这个问题就像是你有所有的密码本,不论某个网站是用什么字符集编码的,你都可以用你手里的密码本把他们翻译过来,但问题是虽然你能浏览中文网页,但是在整个操作系统里面流动的还是英文字符。所以,就像你能听懂英语,也能听懂中文。

最根本的问题是:你不可以写中文。


当你决定要写什么东西的时候,首先要决定的一件事情是用那种语言,对于计算机来说就是你要是用哪一种字符集,你就必须告诉你的linux系统,你想用那一本密码本去写你想要写的东西。知道为什么需要用GB2312字符集去浏览新浪了吧,因为新浪的网页是用GB2312写的。


为了让你的Linux能够输入中文,就需要把系统的locale设定成中文的(严格说来是locale中的语言类别LC_CTYPE
),例如zh_CN.GB2312、zh_CN.GB18030或者zh_CN.UTF-8。很多人都不明白这些古里古怪的表达方式。这个外星表达式规定了什么东西呢?这个问题稍后详述,现在只需要知道,这是locale的表达方式就可以了。


二、到底什么是locale?

locale这个单词中文翻译成地区或者地域,其实这个单词包含的意义要宽泛很多。Locale是根据计算机用户所使用的语言,所在国家或者地区,以及当地的文化传统所定义的一个软件运行时的语言环境。


这个用户环境可以按照所涉及到的文化传统的各个方面分成几个大类,通常包括用户所使用的语言符号及其分类(LC_CTYPE),数字
(LC_NUMERIC),比较和排序习惯(LC_COLLATE),时间显示格式(LC_TIME),货币单位(LC_MONETARY),信息主要是提示信息,错误信息,
状态信息, 标题, 标签, 按钮和菜单等(LC_MESSAGES),姓名书写方式(LC_NAME),地址书写方式(LC_ADDRESS),电话号码书写方式
(LC_TELEPHONE),度量衡表达方式(LC_MEASUREMENT),默认纸张尺寸大小(LC_PAPER)和locale对自身包含信息的概述(LC_IDENTIFICATION)。


所以说,locale就是某一个地域内的人们的语言习惯和文化传统和生活习惯。一个地区的locale就是根据这几大类的习惯定义的,这些
locale定义文件放在/usr/share/i18n/locales目录下面,例如en_US, zh_CN and
de_DE@euro都是locale的定义文件,这些文件都是用文本格式书写的,你可以用写字板打开,看看里边的内容,当然出了有限的注释以外,大部分东西可能你都看不懂,因为是用的Unicode的字符索引方式。


对于de_DE@euro的一点说明,@后边是修正项,也就是说你可以看到两个德国的locale:

/usr/share/i18n/locales/de_DE@euro
/usr/share/i18n/locales/de_DE

打开这两个locale定义,你就会知道它们的差别在于de_DE@euro使用的是欧洲的排序、比较和缩进习惯,而de_DE用的是德国的标准习惯。


上面我们说到了zh_CN.GB18030的前半部分,后半部分是什么呢?大部分Linux用户都知道是系统采用的字符集。


三、什么是字符集?
字符集就是字符,尤其是非英语字符在系统内的编码方式,也就是通常所说的内码,所有的字符集都放在
/usr/share/i18n/charmaps,所有的字符集也都是用Unicode编号索引的。Unicode用统一的编号来索引目前已知的全部的符号。而字符集则是这些符号的编码方式,或者说是在网络传输,计算机内部通信的时候,对于不同字符的表达方式,Unicode是一个静态的概念,字符集是一个动态的概念,是每一个字符传递或传输的具体形式。就像Unicode编号U59D0是代表姐姐的“姐”字,但是具体的这个字是用两个字节表示,三个字节,还是四个字节表示,是字符集的问题。例如:UTF-8字符集就是目前流行的对字符的编码方式,UTF-8用一个字节表示常用的拉丁字母,用两个字节表示常用的符号,包括常用的中文字符,用三个表示不常用的字符,用四个字节表示其他的古灵精怪的字符。而GB2312字符集就是用两个字节表示所有的字符。需要提到一点的是Unicode除了用编号索引全部字符以外,本身是用四个字节存储全部字符,这一点在谈到挂载windows分区的时候是非常重要的一个概念。所以说你也可以把Unicode看作是一种字符集(我不知道它和UTF-32的关系,反正UTF-32就是用四个字节表示所有的字符的),但是这样表述符号是非常浪费资源的,因为在计算机世界绝大部分时候用到的是一个字节就可以搞定的26个字母而已。所以才会有UTF-8,UTF-16等等,要不然大同世界多好,省了这许多麻烦。



四、zh_CN.GB2312到底是在说什么?
Locale 是软件在运行时的语言环境, 它包括语言(Language), 地域
(Territory) 和字符集(Codeset)。一个locale的书写格式为: 语言[_地域[.字符集]].
所以说呢,locale总是和一定的字符集相联系的。下面举几个例子:

1、我说中文,身处中华人民共和国,使用国标2312字符集来表达字符。

zh_CN.GB2312=中文_中华人民共和国+国标2312字符集。


2、我说中文,身处中华人民共和国,使用国标18030字符集来表达字符。

zh_CN.GB18030=中文_中华人民共和国+国标18030字符集。


3、我说中文,身处中华人民共和国台湾省,使用国标Big5字符集来表达字符。
zh_TW.BIG5=中文_台湾.大五码字符集


4、我说英文,身处大不列颠,使用ISO-8859-1字符集来表达字符。

en_GB.ISO-8859-1=英文_大不列颠.ISO-8859-1字符集


5、我说德语,身处德国,使用UTF-8字符集,习惯了欧洲风格。

de_DE.UTF-8@euro=德语_德国.UTF-8字符集@按照欧洲习惯加以修正


注意不是de_DE@euro.UTF-8,所以完全的locale表达方式是
[语言[_地域][.字符集] [@修正值]


生成的locale放在/usr/lib/locale/目录中,并且每个locale都对应一个文件夹,也就是说创建了 de_DE@euro.UTF-8
locale之后,就生成/usr/lib/locale/de_DE@euro.UTF-8/目录,里面是具体的每个locale的内容。


五、怎样去自定义locale

在gentoo生成locale还是很容易的,首先要在USE里面加入userlocales支持,然后编辑locales.build文件,这个文件用来指示glibc生成locale文件。

很多人不明白每一个条目是什么意思。 其实根据上面的说明现在应该很明确了。

File: /etc/locales.build

en_US/ISO-8859-1
en_US.UTF-8/UTF-8

zh_CN/GB18030

zh_CN.GBK/GBK
zh_CN.GB2312/GB2312
zh_CN.UTF-8/UTF-8


上面是我的locales.build文件,依次的说明是这样的:


en_US/ISO-8859-1:生成名为en_US的locale,采用ISO-8859-1字符集,并且把这个locale作为英文_美国locale类的默认值,其实它和en_US.ISO-8859-1/ISO-8859-1没有任何区别。


en_US.UTF-8/UTF-8:生成名为en_US.UTF-8的locale,采用UTF-8字符集。


zh_CN/GB18030:生成名为zh_CN的locale,采用GB18030字符集,并且把这个locale作为中文_中国locale类的默认值,其实它和zh_CN.GB18030/GB18030没有任何区别。


zh_CN.GBK/GBK:生成名为zh_CN.GBK的locale,采用GBK字符集。

zh_CN.GB2312/GB2312:生成名为zh_CN.GB2312的locale,采用GB2312字符集。

zh_CN.UTF-8/UTF-8:生成名为zh_CN.UTF-8的locale,采用UTF-8字符集。


关于默认locale,默认locale可以简写成en_US或者zh_CN的形式,只是为了表达简单而已没有特别的意义。


Gentoo在locale定义的时候掩盖了一些东西,也就是locale的生成工具:localedef。

在编译完glibc之后你可以用这个localedef 再补充一些locale,就会更加理解locale了。具体的可以看 localedef
的manpage。

$localedef -f 字符集 -i locale定义文件 生成的locale的名称
例如

$localedef -f UTF-8 -i zh_CN zh_CN.UTF-8


上面的定义方法和在locales.build中设定zh_CN.UTF-8/UTF-8的结果是一样一样的。



六、locale的五脏六腑


刚刚生成了几个locale,但是为了让它们生效,必须告诉Linux系统使用那(几)个locale。这就需要对locale的内部机制有一点点的了解。在前面我已经提到过,locale把按照所涉及到的文化传统的各个方面分成12个大类,这12个大类分别是:

1、语言符号及其分类(LC_CTYPE)
2、数字(LC_NUMERIC)
3、比较和排序习惯(LC_COLLATE)

4、时间显示格式(LC_TIME)
5、货币单位(LC_MONETARY)
6、信息主要是提示信息,错误信息, 状态信息, 标题,
标签, 按钮和菜单等(LC_MESSAGES)
7、姓名书写方式(LC_NAME)
8、地址书写方式(LC_ADDRESS)

9、电话号码书写方式(LC_TELEPHONE)
10、度量衡表达方式(LC_MEASUREMENT)

11、默认纸张尺寸大小(LC_PAPER)
12、对locale自身包含信息的概述(LC_IDENTIFICATION)。


其中,与中文输入关系最密切的就是 LC_CTYPE, LC_CTYPE
规定了系统内有效的字符以及这些字符的分类,诸如什么是大写字母,小写字母,大小写转换,标点符号、可打印字符和其他的字符属性等方面。而locale定义zh_CN中最最重要的一项就是定义了汉字(Class
“hanzi”)这一个大类,当然也是用Unicode描述的,这就让中文字符在Linux系统中成为合法的有效字符,而且不论它们是用什么字符集编码的。


LC_CTYPE
% This is a copy of the "i18n" LC_CTYPE with the following
modifications: - Additional classes: hanzi

copy "i18n"

class
"hanzi"; /
% <U3400>..<U4DBF>;/

<U4E00>..<U9FA5>;/

<UF92C>;<UF979>;<UF995>;<UF9E7>;<UF9F1>;<UFA0C>;<UFA0D>;<UFA0E>;/

<UFA0F>;<UFA11>;<UFA13>;<UFA14>;<UFA18>;<UFA1F>;<UFA20>;<UFA21>;/

<UFA23>;<UFA24>;<UFA27>;<UFA28>;<UFA29>

END LC_CTYPE


在en_US的locale定义中,并没有定义汉字,所以汉字不是有效字符。所以如果要输入中文必须使用支持中文的locale,也就是zh_XX,如zh_CN,zh_TW,zh_HK等等。


另外非常重要的一点就是这些分类是彼此独立的,也就是说LC_CTYPE,LC_COLLATE和
LC_MESSAGES等等分类彼此之间是独立的,可以根据用户的需要设定成不同的值。这一点对很多用户是有利的,甚至是必须的。例如,我就需要一个能够输入中文的英文环境,所以我可以把LC_CTYPE设定成zh_CN.GB18030,而其他所有的项都是en_US.UTF-8。



七、怎样设定locale呢?

设定locale就是设定12大类的locale分类属性,即
12个LC_*。除了这12个变量可以设定以外,为了简便起见,还有两个变量:LC_ALL和LANG。它们之间有一个优先级的关系:

LC_ALL>LC_*>LANG
可以这么说,LC_ALL是最上级设定或者强制设定,而LANG是默认设定值。

1、如果你设定了LC_ALL=zh_CN.UTF-8,那么不管LC_*和LANG设定成什么值,它们都会被强制服从LC_ALL的设定,成为
zh_CN.UTF-8。

2、假如你设定了LANG=zh_CN.UTF-8,而其他的LC_*=en_US.UTF-8,并且没有设定LC_ALL的话,那么系统的locale设定以LC_*=en_US.UTF-8。

3、假如你设定了LANG=zh_CN.UTF-8,而其他的LC_*,和LC_ALL均未设定的话,系统会将LC_*设定成默认值,也就是LANG的值
zh_CN.UTF-8 。

4、假如你设定了LANG=zh_CN.UTF-8,而其他的LC_CTYPE=en_US.UTF-8,其他的LC_*,和LC_ALL均未设定的话,那么系统的locale设定将是:LC_CTYPE=en_US.UTF-8,其余的
LC_COLLATE,LC_MESSAGES等等均会采用默认值,也就是LANG的值,也就是LC_COLLATE=LC_MESSAGES=……=
LC_PAPER=LANG=zh_CN.UTF-8。

所以,locale是这样设定的:

1、如果你需要一个纯中文的系统的话,设定LC_ALL= zh_CN.XXXX,或者LANG=
zh_CN.XXXX都可以,当然你可以两个都设定,但正如上面所讲,LC_ALL的值将覆盖所有其他的locale设定,不要作无用功。

2、如果你只想要一个可以输入中文的环境,而保持菜单、标题,系统信息等等为英文界面,那么只需要设定LC_CTYPE=zh_CN.XXXX,
LANG=en_US.XXXX就可以了。这样LC_CTYPE=zh_CN.XXXX,而LC_COLLATE=LC_MESSAGES=……=
LC_PAPER=LANG=en_US.XXXX。
3、假如你高兴的话,可以把12个LC_*一一设定成你需要的值,打造一个古灵精怪的系统:

LC_CTYPE=zh_CN.GBK/GBK(使用中文编码内码GBK字符集);

LC_NUMERIC=en_GB.ISO-8859-1(使用大不列颠的数字系统)

LC_MEASUREMEN=de_DE@euro.ISO-8859-15(德国的度量衡使用ISO-8859-15字符集)

罗马的地址书写方式,美国的纸张设定……。估计没人这么干吧。

4、假如你什么也不做的话,也就是LC_ALL,LANG和LC_*均不指定特定值的话,系统将采用POSIX作为lcoale,也就是C locale。






The Open Group Base Specifications Issue 6
IEEE Std
1003.1, 2004 Edition
Copyright © 2001-2004 The IEEE and The Open Group, All
Rights reserved.




7. Locale


7.1 General


A locale is the definition of the subset of a user's environment that depends
on language and cultural conventions. It is made up from one or more categories.
Each category is identified by its name and controls specific aspects of the
behavior of components of the system. Category names correspond to the following
environment variable names:


LC_CTYPE
Character classification and case conversion.
LC_COLLATE
Collation order.
LC_MONETARY
Monetary formatting.
LC_NUMERIC
Numeric, non-monetary formatting.
LC_TIME
Date and time formats.
LC_MESSAGES
Formats of informative and diagnostic messages and interactive responses.

The standard utilities in the Shell and Utilities volume of
IEEE Std 1003.1-2001 shall base their behavior on the current locale, as defined
in the ENVIRONMENT VARIABLES section for each utility. The behavior of some of
the C-language functions defined in the System Interfaces volume of
IEEE Std 1003.1-2001 shall also be modified based on the current locale, as
defined by the last call to setlocale().


Locales other than those supplied by the implementation can be created via
the localedef utility, provided
that the _POSIX2_LOCALEDEF symbol is defined on the system. Even if localedef is not provided, all
implementations conforming to the System Interfaces volume of
IEEE Std 1003.1-2001 shall provide one or more locales that behave as described
in this chapter. The input to the utility is described in Locale Definition. The value that is used to specify a
locale when using environment variables shall be the string specified as the
name operand to the localedef utility when the locale
was created. The strings "C" and "POSIX" are reserved as
identifiers for the POSIX locale (see POSIX Locale).
When the value of a locale environment variable begins with a slash (
'/' ), it shall be interpreted as the pathname of the locale
definition; the type of file (regular, directory, and so on) used to store the
locale definition is implementation-defined. If the value does not begin with a
slash, the mechanism used to locate the locale is implementation-defined.


If different character sets are used by the locale categories, the results
achieved by an application utilizing these categories are undefined. Likewise,
if different codesets are used for the data being processed by interfaces whose
behavior is dependent on the current locale, or the codeset is different from
the codeset assumed when the locale was created, the result is also
undefined.


Applications can select the desired locale by invoking the setlocale() function (or
equivalent) with the appropriate value. If the function is invoked with an empty
string, such as:


setlocale(LC_ALL, "");


the value of the corresponding environment variable is used. If the
environment variable is unset or is set to the empty string, the implementation
shall set the appropriate environment as defined in Environment Variables.


7.2 POSIX Locale


Conforming systems shall provide a POSIX locale, also known as the C locale.
The behavior of standard utilities and functions in the POSIX locale shall be as
if the locale was defined via the localedef utility with input data
from the POSIX locale tables in Locale Definition.


The tables in Locale Definition describe the
characteristics and behavior of the POSIX locale for data consisting entirely of
characters from the portable character set and the control character set. For
other characters, the behavior is unspecified. For C-language programs, the
POSIX locale shall be the default locale when the setlocale() function is not
called.


The POSIX locale can be specified by assigning to the appropriate environment
variables the values "C" or "POSIX".


All implementations shall define a locale as the default locale, to be
invoked when no environment variables are set, or set to the empty string. This
default locale can be the POSIX locale or any other implementation-defined
locale. Some implementations may provide facilities for local installation
administrators to set the default locale, customizing it for each location.
IEEE Std 1003.1-2001 does not require such a facility.


7.3 Locale Definition


The capability to specify additional locales to those provided by an
implementation is optional, denoted by the _POSIX2_LOCALEDEF symbol. If the
option is not supported, only implementation-supplied locales are available.
Such locales shall be documented using the format specified in this section.


Locales can be described with the file format presented in this section. The
file format is that accepted by the localedef utility. For the
purposes of this section, the file is referred to as the "locale definition
file", but no locales shall be affected by this file unless it is processed by
localedef or some similar
mechanism. Any requirements in this section imposed upon the utility shall apply
to localedef or to any other
similar utility used to install locale information using the locale definition
file format described here.


The locale definition file shall contain one or more locale category source
definitions, and shall not contain more than one definition for the same locale
category. If the file contains source definitions for more than one category,
implementation-defined categories, if present, shall appear after the categories
defined by General. A category source definition
contains either the definition of a category or a copy directive. For a
description of the copy directive, see localedef. In the event that some
of the information for a locale category, as specified in this volume of
IEEE Std 1003.1-2001, is missing from the locale source definition, the behavior
of that category, if it is referenced, is unspecified.


A category source definition shall consist of a category header, a category
body, and a category trailer. A category header shall consist of the character
string naming of the category, beginning with the characters LC_ . The
category trailer shall consist of the string "END", followed by one or
more <blank>s and the string used in the corresponding category
header.


The category body shall consist of one or more lines of text. Each line shall
contain an identifier, optionally followed by one or more operands. Identifiers
shall be either keywords, identifying a particular locale element, or collating
elements. In addition to the keywords defined in this volume of
IEEE Std 1003.1-2001, the source can contain implementation-defined keywords.
Each keyword within a locale shall have a unique name (that is, two categories
cannot have a commonly-named keyword); no keyword shall start with the
characters LC_ . Identifiers shall be separated from the operands by one
or more <blank>s.


Operands shall be characters, collating elements, or strings of characters.
Strings shall be enclosed in double-quotes. Literal double-quotes within strings
shall be preceded by the <escape character>, described below. When
a keyword is followed by more than one operand, the operands shall be separated
by semicolons; <blank>s shall be allowed both before and after a
semicolon.


The first category header in the file can be preceded by a line modifying the
comment character. It shall have the following format, starting in column 1:


"comment_char %c\n", <comment character>


The comment character shall default to the number sign ( '#' ).
Blank lines and lines containing the <comment character> in the
first position shall be ignored.


The first category header in the file can be preceded by a line modifying the
escape character to be used in the file. It shall have the following format,
starting in column 1:


"escape_char %c\n", <escape character>


The escape character shall default to backslash, which is the character used
in all examples shown in this volume of IEEE Std 1003.1-2001.


A line can be continued by placing an escape character as the last character
on the line; this continuation character shall be discarded from the input.
Although the implementation need not accept any one portion of a continued line
with a length exceeding {LINE_MAX} bytes, it shall place no limits on the
accumulated length of the continued line. Comment lines shall not be continued
on a subsequent line using an escaped <newline>.


Individual characters, characters in strings, and collating elements shall be
represented using symbolic names, as defined below. In addition, characters can
be represented using the characters themselves or as octal, hexadecimal, or
decimal constants. When non-symbolic notation is used, the resultant locale
definitions are in many cases not portable between systems. The left angle
bracket ( '<' ) is a reserved symbol, denoting the start of a
symbolic name; when used to represent itself it shall be preceded by the escape
character. The following rules apply to character representation:



  1. A character can be represented via a symbolic name, enclosed within angle
    brackets '<' and '>'. The symbolic name, including the
    angle brackets, shall exactly match a symbolic name defined in the charmap file
    specified via the localedef
    -f option, and it shall be replaced by a character value determined from
    the value associated with the symbolic name in the charmap file. The use of a
    symbolic name not found in the charmap file shall constitute an error, unless
    the category is LC_CTYPE or LC_COLLATE , in which case it shall
    constitute a warning condition (see localedef for a description of
    actions resulting from errors and warnings). The specification of a symbolic
    name in a collating-element or collating-symbol section that
    duplicates a symbolic name in the charmap file (if present) shall be an error.
    Use of the escape character or a right angle bracket within a symbolic name is
    invalid unless the character is preceded by the escape character.


    For example:


    <c>;<c-cedilla> "<M><a><y>"



  2. A character in the portable character set can be represented by the character
    itself, in which case the value of the character is implementation-defined.
    (Implementations may allow other characters to be represented as themselves, but
    such locale definitions are not portable.) Within a string, the double-quote
    character, the escape character, and the right angle bracket character shall be
    escaped (preceded by the escape character) to be interpreted as the character
    itself. Outside strings, the characters:


    , ; < > escape_char

    shall be escaped to be interpreted as the character itself.


    For example:


    c "May"



  3. A character can be represented as an octal constant. An octal constant shall
    be specified as the escape character followed by two or three octal digits. Each
    constant shall represent a byte value. Multi-byte values can be represented by
    concatenated constants specified in byte order with the last constant specifying
    the least significant byte of the character.


    For example:


    \143;\347;\143\150 "\115\141\171"



  4. A character can be represented as a hexadecimal constant. A hexadecimal
    constant shall be specified as the escape character followed by an 'x'
    followed by two hexadecimal digits. Each constant shall represent a byte value.
    Multi-byte values can be represented by concatenated constants specified in byte
    order with the last constant specifying the least significant byte of the
    character.


    For example:


    \x63;\xe7;\x63\x68 "\x4d\x61\x79"



  5. A character can be represented as a decimal constant. A decimal constant
    shall be specified as the escape character followed by a 'd' followed
    by two or three decimal digits. Each constant represents a byte value.
    Multi-byte values can be represented by concatenated constants specified in byte
    order with the last constant specifying the least significant byte of the
    character.


    For example:


    \d99;\d231;\d99\d104 "\d77\d97\d121"


Implementations may accept single-digit octal, decimal, or hexadecimal
constants following the escape character. Only characters existing in the
character set for which the locale definition is created shall be specified,
whether using symbolic names, the characters themselves, or octal, decimal, or
hexadecimal constants. If a charmap file is present, only characters defined in
the charmap can be specified using octal, decimal, or hexadecimal constants.
Symbolic names not present in the charmap file can be specified and shall be
ignored, as specified under item 1 above.


7.3.1 LC_CTYPE


The LC_CTYPE category shall define character classification, case
conversion, and other character attributes. In addition, a series of characters
can be represented by three adjacent periods representing an ellipsis symbol (
"..." ). The ellipsis specification shall be interpreted as meaning
that all values between the values preceding and following it represent valid
characters. The ellipsis specification shall be valid only within a single
encoded character set; that is, within a group of characters of the same size.
An ellipsis shall be interpreted as including in the list all characters with an
encoded value higher than the encoded value of the character preceding the
ellipsis and lower than the encoded value of the character following the
ellipsis.


For example:


\x30;...;\x39;


includes in the character class all characters with encoded values between
the endpoints.


The following keywords shall be recognized. In the descriptions, the term
"automatically included" means that it shall not be an error either to include
or omit any of the referenced characters; the implementation provides them if
missing (even if the entire keyword is missing) and accepts them silently if
present. When the implementation automatically includes a missing character, it
shall have an encoded value dependent on the charmap file in effect (see the
description of the localedef
-f option); otherwise, it shall have a value derived from an
implementation-defined character mapping.


The character classes digit, xdigit, lower,
upper, and space have a set of automatically included characters.
These only need to be specified if the character values (that is, encoding)
differ from the implementation default values. It is not possible to define a
locale without these automatically included characters unless some
implementation extension is used to prevent their inclusion. Such a definition
would not be a proper superset of the C or POSIX locale and, thus, it might not
be possible for conforming applications to work properly.


copy
Specify the name of an existing locale which shall be used as the definition
of this category. If this keyword is specified, no other keyword shall be
specified.
upper
Define characters to be classified as uppercase letters.

In the POSIX locale, the 26 uppercase letters shall be included:


A B C D E F G H I J K L M N O P Q R S T U V W X Y Z


In a locale definition file, no character specified for the keywords
cntrl, digit, punct, or space shall be specified.
The uppercase letters <A> to <Z>, as defined in Character Set Description File (the
portable character set), are automatically included in this class.


lower
Define characters to be classified as lowercase letters.

In the POSIX locale, the 26 lowercase letters shall be included:


a b c d e f g h i j k l m n o p q r s t u v w x y z


In a locale definition file, no character specified for the keywords
cntrl, digit, punct, or space shall be specified.
The lowercase letters <a> to <z> of the portable character set are
automatically included in this class.


alpha
Define characters to be classified as letters.

In the POSIX locale, all characters in the classes upper and
lower shall be included.


In a locale definition file, no character specified for the keywords
cntrl, digit, punct, or space shall be specified.
Characters classified as either upper or lower are automatically
included in this class.


digit
Define the characters to be classified as numeric digits.

In the POSIX locale, only:


0 1 2 3 4 5 6 7 8 9


shall be included.


In a locale definition file, only the digits <zero>, <one>,
<two>, <three>, <four>, <five>, <six>,
<seven>, <eight>, and <nine> shall be specified, and in
contiguous ascending sequence by numerical value. The digits <zero> to
<nine> of the portable character set are automatically included in this
class.


alnum
Define characters to be classified as letters and numeric digits. Only the
characters specified for the alpha and digit keywords shall be
specified. Characters specified for the keywords alpha and digit
are automatically included in this class.
space
Define characters to be classified as white-space characters.

In the POSIX locale, at a minimum, the <space>, <form-feed>,
<newline>, <carriage-return>, <tab>, and <vertical-tab>
shall be included.


In a locale definition file, no character specified for the keywords
upper, lower, alpha, digit, graph, or
xdigit shall be specified. The <space>, <form-feed>,
<newline>, <carriage-return>, <tab>, and <vertical-tab>
of the portable character set, and any characters included in the class
blank are automatically included in this class.


cntrl
Define characters to be classified as control characters.

In the POSIX locale, no characters in classes alpha or print
shall be included.


In a locale definition file, no character specified for the keywords
upper, lower, alpha, digit, punct,
graph, print, or xdigit shall be specified.


punct
Define characters to be classified as punctuation characters.

In the POSIX locale, neither the <space> nor any characters in classes
alpha, digit, or cntrl shall be included.


In a locale definition file, no character specified for the keywords
upper, lower, alpha, digit, cntrl,
xdigit, or as the <space> shall be specified.


graph
Define characters to be classified as printable characters, not including
the <space>.

In the POSIX locale, all characters in classes alpha, digit,
and punct shall be included; no characters in class cntrl shall be
included.


In a locale definition file, characters specified for the keywords
upper, lower, alpha, digit, xdigit, and
punct are automatically included in this class. No character specified
for the keyword cntrl shall be specified.


print
Define characters to be classified as printable characters, including the
<space>.

In the POSIX locale, all characters in class graph shall be included;
no characters in class cntrl shall be included.


In a locale definition file, characters specified for the keywords
upper, lower, alpha, digit, xdigit,
punct, graph, and the <space> are automatically included in
this class. No character specified for the keyword cntrl shall be
specified.


xdigit
Define the characters to be classified as hexadecimal digits.

In the POSIX locale, only:


0 1 2 3 4 5 6 7 8 9 A B C D E F a b c d e f


shall be included.


In a locale definition file, only the characters defined for the class
digit shall be specified, in contiguous ascending sequence by numerical
value, followed by one or more sets of six characters representing the
hexadecimal digits 10 to 15 inclusive, with each set in ascending order (for
example, <A>, <B>, <C>, <D>, <E>, <F>,
<a>, <b>, <c>, <d>, <e>, <f>). The digits
<zero> to <nine>, the uppercase letters <A> to <F>, and
the lowercase letters <a> to <f> of the portable character set are
automatically included in this class.


blank
Define characters to be classified as <blank>s.

In the POSIX locale, only the <space> and <tab> shall be
included.


In a locale definition file, the <space> and <tab> are
automatically included in this class.


charclass
Define one or more locale-specific character class names as strings
separated by semicolons. Each named character class can then be defined
subsequently in the LC_CTYPE definition. A character class name shall
consist of at least one and at most {CHARCLASS_NAME_MAX} bytes of alphanumeric
characters from the portable filename character set. The first character of a
character class name shall not be a digit. The name shall not match any of the
LC_CTYPE keywords defined in this volume of IEEE Std 1003.1-2001. Future
revisions of IEEE Std 1003.1-2001 will not specify any LC_CTYPE keywords
containing uppercase letters.
charclass-name
Define characters to be classified as belonging to the named locale-specific
character class. In the POSIX locale, locale-specific named character classes
need not exist.

If a class name is defined by a charclass keyword, but no characters
are subsequently assigned to it, this is not an error; it represents a class
without any characters belonging to it.


The charclass-name can be used as the property argument to the
wctype() function, in regular
expression and shell pattern-matching bracket expressions, and by the tr command.


toupper
Define the mapping of lowercase letters to uppercase letters.

In the POSIX locale, at a minimum, the 26 lowercase characters:


a b c d e f g h i j k l m n o p q r s t u v w x y z


shall be mapped to the corresponding 26 uppercase characters:


A B C D E F G H I J K L M N O P Q R S T U V W X Y Z


In a locale definition file, the operand shall consist of character pairs,
separated by semicolons. The characters in each character pair shall be
separated by a comma and the pair enclosed by parentheses. The first character
in each pair is the lowercase letter, the second the corresponding uppercase
letter. Only characters specified for the keywords lower and upper
shall be specified. The lowercase letters <a> to <z>, and their
corresponding uppercase letters <A> to <Z>, of the portable
character set are automatically included in this mapping, but only when the
toupper keyword is omitted from the locale definition.


tolower
Define the mapping of uppercase letters to lowercase letters.

In the POSIX locale, at a minimum, the 26 uppercase characters:


A B C D E F G H I J K L M N O P Q R S T U V W X Y Z


shall be mapped to the corresponding 26 lowercase characters:


a b c d e f g h i j k l m n o p q r s t u v w x y z


In a locale definition file, the operand shall consist of character pairs,
separated by semicolons. The characters in each character pair shall be
separated by a comma and the pair enclosed by parentheses. The first character
in each pair is the uppercase letter, the second the corresponding lowercase
letter. Only characters specified for the keywords lower and upper
shall be specified. If the tolower keyword is omitted from the locale
definition, the mapping is the reverse mapping of the one specified for
toupper.


The following table shows the character class combinations allowed:


Table: Valid Character Class
Combinations




















































































































































































 


Can Also Belong To


 


 


_


 


In Class


upper


lower


alpha


digit


space


cntrl


punct


graph


print


xdigit


blank


 


upper




-


A


x


x


x


x


A


A


-


x


 


lower


-




A


x


x


x


x


A


A


-


x


 


alpha


-


-




x


x


x


x


A


A


-


x


 


digit


x


x


x




x


x


x


A


A


A


x


 


space


x


x


x


x




-


*


*


*


x


-


 


cntrl


x


x


x


x


-




x


x


x


x


-


 


punct


x


x


x


x


-


x




A


A


x


-


 


graph


-


-


-


-


-


x


-




A


-


-


 


print


-


-


-


-


-


x


-


-




-


-


 


xdigit


-


-


-


-


x


x


x


A


A




x


 


blank


x


x


x


x


A


-


*


*


*


x


 


 



Notes:


  1. Explanation of codes:


    A
    Automatically included; see text.
    -
    Permitted.
    x
    Mutually-exclusive.
    *
    See note 2.

  2. The <space>, which is part of the space and blank
    classes, cannot belong to punct or graph, but shall automatically
    belong to the print class. Other space or blank characters
    can be classified as any of punct, graph, or
    print.


LC_CTYPE Category in the POSIX Locale

The character classifications for the POSIX locale follow; the code listing
depicts the localedef input,
and the table represents the same information, sorted by character.

LC_CTYPE
# The following is the POSIX locale LC_CTYPE.
# "alpha" is by default "upper" and "lower"
# "alnum" is by definition "alpha" and "digit"
# "print" is by default "alnum", "punct", and the <space>
# "graph" is by default "alnum" and "punct"
#
upper <A>;<B>;<C>;<D>;<E>;<F>;<G>;<H>;<I>;<J>;<K>;<L>;<M>;\
<N>;<O>;<P>;<Q>;<R>;<S>;<T>;<U>;<V>;<W>;<X>;<Y>;<Z>
#
lower <a>;<b>;<c>;<d>;<e>;<f>;<g>;<h>;<i>;<j>;<k>;<l>;<m>;\
<n>;<o>;<p>;<q>;<r>;<s>;<t>;<u>;<v>;<w>;<x>;<y>;<z>
#
digit <zero>;<one>;<two>;<three>;<four>;<five>;<six>;\
<seven>;<eight>;<nine>
#
space <tab>;<newline>;<vertical-tab>;<form-feed>;\
<carriage-return>;<space>
#
cntrl <alert>;<backspace>;<tab>;<newline>;<vertical-tab>;\
<form-feed>;<carriage-return>;\
<NUL>;<SOH>;<STX>;<ETX>;<EOT>;<ENQ>;<ACK>;<SO>;\
<SI>;<DLE>;<DC1>;<DC2>;<DC3>;<DC4>;<NAK>;<SYN>;\
<ETB>;<CAN>;<EM>;<SUB>;<ESC>;<IS4>;<IS3>;<IS2>;\
<IS1>;<DEL>
#
punct <exclamation-mark>;<quotation-mark>;<number-sign>;\
<dollar-sign>;<percent-sign>;<ampersand>;<apostrophe>;\
<left-parenthesis>;<right-parenthesis>;<asterisk>;\
<plus-sign>;<comma>;<hyphen>;<period>;<slash>;\
<colon>;<semicolon>;<less-than-sign>;<equals-sign>;\
<greater-than-sign>;<question-mark>;<commercial-at>;\
<left-square-bracket>;<backslash>;<right-square-bracket>;\
<circumflex>;<underscore>;<grave-accent>;<left-curly-bracket>;\
<vertical-line>;<right-curly-bracket>;<tilde>
#
xdigit <zero>;<one>;<two>;<three>;<four>;<five>;<six>;<seven>;\
<eight>;<nine>;<A>;<B>;<C>;<D>;<E>;<F>;<a>;<b>;<c>;<d>;<e>;<f>
#
blank <space>;<tab>
#
toupper (<a>,<A>);(<b>,<B>);(<c>,<C>);(<d>,<D>);(<e>,<E>);\
(<f>,<F>);(<g>,<G>);(<h>,<H>);(<i>,<I>);(<j>,<J>);\
(<k>,<K>);(<l>,<L>);(<m>,<M>);(<n>,<N>);(<o>,<O>);\
(<p>,<P>);(<q>,<Q>);(<r>,<R>);(<s>,<S>);(<t>,<T>);\
(<u>,<U>);(<v>,<V>);(<w>,<W>);(<x>,<X>);(<y>,<Y>);(<z>,<Z>)
#
tolower (<A>,<a>);(<B>,<b>);(<C>,<c>);(<D>,<d>);(<E>,<e>);\
(<F>,<f>);(<G>,<g>);(<H>,<h>);(<I>,<i>);(<J>,<j>);\
(<K>,<k>);(<L>,<l>);(<M>,<m>);(<N>,<n>);(<O>,<o>);\
(<P>,<p>);(<Q>,<q>);(<R>,<r>);(<S>,<s>);(<T>,<t>);\
(<U>,<u>);(<V>,<v>);(<W>,<w>);(<X>,<x>);(<Y>,<y>);(<Z>,<z>)
END LC_CTYPE









































































































































































































































































































































































































































































































































Symbolic Name


Other Case


Character Classes


<NUL>


 


cntrl


<SOH>


 


cntrl


<STX>


 


cntrl


<ETX>


 


cntrl


<EOT>


 


cntrl


<ENQ>


 


cntrl


<ACK>


 


cntrl


<alert>


 


cntrl


<backspace>


 


cntrl


<tab>


 


cntrl, space, blank


<newline>


 


cntrl, space


<vertical-tab>


 


cntrl, space


<form-feed>


 


cntrl, space


<carriage-return>


 


cntrl, space


<SO>


 


cntrl


<SI>


 


cntrl


<DLE>


 


cntrl


<DC1>


 


cntrl


<DC2>


 


cntrl


<DC3>


 


cntrl


<DC4>


 


cntrl


<NAK>


 


cntrl


<SYN>


 


cntrl


<ETB>


 


cntrl


<CAN>


 


cntrl


<EM>


 


cntrl


<SUB>


 


cntrl


<ESC>


 


cntrl


<IS4>


 


cntrl


<IS3>


 


cntrl


<IS2>


 


cntrl


<IS1>


 


cntrl


<space>


 


space, print, blank


<exclamation-mark>


 


punct, print, graph


<quotation-mark>


 


punct, print, graph


<number-sign>


 


punct, print, graph


<dollar-sign>


 


punct, print, graph


<percent-sign>


 


punct, print, graph


<ampersand>


 


punct, print, graph


<apostrophe>


 


punct, print, graph


<left-parenthesis>


 


punct, print, graph


<right-parenthesis>


 


punct, print, graph


<asterisk>


 


punct, print, graph


<plus-sign>


 


punct, print, graph


<comma>


 


punct, print, graph


<hyphen>


 


punct, print, graph


<period>


 


punct, print, graph


<slash>


 


punct, print, graph


<zero>


 


digit, xdigit, print, graph


<one>


 


digit, xdigit, print, graph


<two>


 


digit, xdigit, print, graph


<three>


 


digit, xdigit, print, graph


<four>


 


digit, xdigit, print, graph


<five>


 


digit, xdigit, print, graph


<six>


 


digit, xdigit, print, graph


<seven>


 


digit, xdigit, print, graph


<eight>


 


digit, xdigit, print, graph


<nine>


 


digit, xdigit, print, graph


<colon>


 


punct, print, graph


<semicolon>


 


punct, print, graph


<less-than-sign>


 


punct, print, graph


<equals-sign>


 


punct, print, graph


<greater-than-sign>


 


punct, print, graph


<question-mark>


 


punct, print, graph


<commercial-at>


 


punct, print, graph


<A>


<a>


upper, xdigit, alpha, print, graph


<B>


<b>


upper, xdigit, alpha, print, graph


<C>


<c>


upper, xdigit, alpha, print, graph


<D>


<d>


upper, xdigit, alpha, print, graph


<E>


<e>


upper, xdigit, alpha, print, graph


<F>


<f>


upper, xdigit, alpha, print, graph


<G>


<g>


upper, alpha, print, graph


<H>


<h>


upper, alpha, print, graph


<I>


<i>


upper, alpha, print, graph


<J>


<j>


upper, alpha, print, graph


<K>


<k>


upper, alpha, print, graph


<L>


<l>


upper, alpha, print, graph


<M>


<m>


upper, alpha, print, graph


<N>


<n>


upper, alpha, print, graph


<O>


<o>


upper, alpha, print, graph


<P>


<p>


upper, alpha, print, graph


<Q>


<q>


upper, alpha, print, graph


<R>


<r>


upper, alpha, print, graph


<S>


<s>


upper, alpha, print, graph


<T>


<t>


upper, alpha, print, graph


<U>


<u>


upper, alpha, print, graph


<V>


<v>


upper, alpha, print, graph


<W>


<w>


upper, alpha, print, graph


<X>


<x>


upper, alpha, print, graph


<Y>


<y>


upper, alpha, print, graph


<Z>


<z>


upper, alpha, print, graph


<left-square-bracket>


 


punct, print, graph


<backslash>


 


punct, print, graph


<right-square-bracket>


 


punct, print, graph


<circumflex>


 


punct, print, graph


<underscore>


 


punct, print, graph


<grave-accent>


 


punct, print, graph


<a>


<A>


lower, xdigit, alpha, print, graph


<b>


<B>


lower, xdigit, alpha, print, graph


<c>


<C>


lower, xdigit, alpha, print, graph


<d>


<D>


lower, xdigit, alpha, print, graph


<e>


<E>


lower, xdigit, alpha, print, graph


<f>


<F>


lower, xdigit, alpha, print, graph


<g>


<G>


lower, alpha, print, graph


<h>


<H>


lower, alpha, print, graph


<i>


<I>


lower, alpha, print, graph


<j>


<J>


lower, alpha, print, graph


<k>


<K>


lower, alpha, print, graph


<l>


<L>


lower, alpha, print, graph


<m>


<M>


lower, alpha, print, graph


<n>


<N>


lower, alpha, print, graph


<o>


<O>


lower, alpha, print, graph


<p>


<P>


lower, alpha, print, graph


<q>


<Q>


lower, alpha, print, graph


<r>


<R>


lower, alpha, print, graph


<s>


<S>


lower, alpha, print, graph


<t>


<T>


lower, alpha, print, graph


<u>


<U>


lower, alpha, print, graph


<v>


<V>


lower, alpha, print, graph


<w>


<W>


lower, alpha, print, graph


<x>


<X>


lower, alpha, print, graph


<y>


<Y>


lower, alpha, print, graph


<z>


<Z>


lower, alpha, print, graph


<left-curly-bracket>


 


punct, print, graph


<vertical-line>


 


punct, print, graph


<right-curly-bracket>


 


punct, print, graph


<tilde>


 


punct, print, graph


<DEL>


 


cntrl


7.3.2 LC_COLLATE


The LC_COLLATE category provides a collation sequence definition for
numerous utilities in the Shell and Utilities volume of IEEE Std 1003.1-2001 (
sort, uniq, and so on), regular expression
matching (see Regular Expressions),
and the strcoll(), strxfrm(), wcscoll(), and wcsxfrm() functions in the System
Interfaces volume of IEEE Std 1003.1-2001.


A collation sequence definition shall define the relative order between
collating elements (characters and multi-character collating elements) in the
locale. This order is expressed in terms of collation values; that is, by
assigning each element one or more collation values (also known as collation
weights). This does not imply that implementations shall assign such values, but
that ordering of strings using the resultant collation definition in the locale
behaves as if such assignment is done and used in the collation process. At
least the following capabilities are provided:



  1. Multi-character collating elements. Specification of multi-character
    collating elements (that is, sequences of two or more characters to be collated
    as an entity).



  2. User-defined ordering of collating elements. Each collating element
    shall be assigned a collation value defining its order in the character (or
    basic) collation sequence. This ordering is used by regular expressions and
    pattern matching and, unless collation weights are explicitly specified, also as
    the collation weight to be used in sorting.



  3. Multiple weights and equivalence classes. Collating elements can be
    assigned one or more (up to the limit {COLL_WEIGHTS_MAX}, as defined in <limits.h>) collating weights for use in
    sorting. The first weight is hereafter referred to as the primary weight.



  4. One-to-many mapping. A single character is mapped into a string of
    collating elements.



  5. Equivalence class definition. Two or more collating elements have the
    same collation value (primary weight).



  6. Ordering by weights. When two strings are compared to determine their
    relative order, the two strings are first broken up into a series of collating
    elements; the elements in each successive pair of elements are then compared
    according to the relative primary weights for the elements. If equal, and more
    than one weight has been assigned, then the pairs of collating elements are
    re-compared according to the relative subsequent weights, until either a pair of
    collating elements compare unequal or the weights are exhausted.


The following keywords shall be recognized in a collation sequence
definition. They are described in detail in the following sections.


copy
Specify the name of an existing locale which shall be used as the definition
of this category. If this keyword is specified, no other keyword shall be
specified.
collating-element
Define a collating-element symbol representing a multi-character collating
element. This keyword is optional.
collating-symbol
Define a collating symbol for use in collation order statements. This
keyword is optional.
order_start
Define collation rules. This statement shall be followed by one or more
collation order statements, assigning character collation values and collation
weights to collating elements.
order_end
Specify the end of the collation-order statements.
The collating-element Keyword

In addition to the collating elements in the character set, the
collating-element keyword can be used to define multi-character collating
elements. The syntax is as follows:


"collating-element %s from \"%s\"\n", <collating-symbol>, <string>


The <collating-symbol> operand shall be a symbolic name,
enclosed between angle brackets ( '<' and '>' ), and
shall not duplicate any symbolic name in the current charmap file (if any), or
any other symbolic name defined in this collation definition. The string operand
is a string of two or more characters that collates as an entity. A
<collating-element> defined via this keyword is only recognized
with the LC_COLLATE category.


For example:


collating-element <ch> from "<c><h>"
collating-element <e-acute> from "<acute><e>"
collating-element <ll> from "ll"


The collating-symbol Keyword

This keyword shall be used to define symbols for use in collation sequence
statements; that is, between the order_start and the order_end
keywords. The syntax is as follows:


"collating-symbol %s\n", <collating-symbol>


The <collating-symbol> shall be a symbolic name, enclosed
between angle brackets ( '<' and '>' ), and shall not
duplicate any symbolic name in the current charmap file (if any), or any other
symbolic name defined in this collation definition. A
<collating-symbol> defined via this keyword is only recognized
within the LC_COLLATE category.


For example:


collating-symbol <UPPER_CASE>
collating-symbol <HIGH>


The collating-symbol keyword defines a symbolic name that can be
associated with a relative position in the character order sequence. While such
a symbolic name does not represent any collating element, it can be used as a
weight.


The order_start Keyword

The order_start keyword shall precede collation order entries and also
define the number of weights for this collation sequence definition and other
collation rules. The syntax is as follows:


"order_start %s;%s;...;%s\n", <sort-rules>, <sort-rules> ...


The operands to the order_start keyword are optional. If present, the
operands define rules to be applied when strings are compared. The number of
operands define how many weights each element is assigned; if no operands are
present, one forward operand is assumed. If present, the first operand
defines rules to be applied when comparing strings using the first (primary)
weight; the second when comparing strings using the second weight, and so on.
Operands shall be separated by semicolons ( ';' ). Each operand shall
consist of one or more collation directives, separated by commas ( ','
). If the number of operands exceeds the {COLL_WEIGHTS_MAX} limit, the utility
shall issue a warning message. The following directives shall be supported:


forward
Specifies that comparison operations for the weight level shall proceed from
start of string towards the end of string.
backward
Specifies that comparison operations for the weight level shall proceed from
end of string towards the beginning of string.
position
Specifies that comparison operations for the weight level shall consider the
relative position of elements in the strings not subject to IGNORE. The
string containing an element not subject to IGNORE after the fewest
collating elements subject to IGNORE from the start of the compare shall
collate first. If both strings contain a character not subject to IGNORE
in the same relative position, the collating values assigned to the elements
shall determine the ordering. In case of equality, subsequent characters not
subject to IGNORE shall be considered in the same manner.

The directives forward and backward are mutually-exclusive.


If no operands are specified, a single forward operand shall be
assumed.


For example:


order_start forward;backward


Collation Order

The order_start keyword shall be followed by collating identifier
entries. The syntax for the collating element entries is as follows:


"%s %s;%s;...;%s\n", <collating-identifier>, <weight>, <weight>, ...


Each collating-identifier shall consist of either a character (in any
of the forms defined in Locale Definition), a
<collating-element>, a <collating-symbol>, an
ellipsis, or the special symbol UNDEFINED. The order in which collating
elements are specified determines the character order sequence, such that each
collating element shall compare less than the elements following it.


A <collating-element> shall be used to specify multi-character
collating elements, and indicates that the character sequence specified via the
<collating-element> is to be collated as a unit and in the relative
order specified by its place.


A <collating-symbol> can be used to define a position in the
relative order for use in weights. No weights shall be specified with a
<collating-symbol>.


The ellipsis symbol specifies that a sequence of characters shall collate
according to their encoded character values. It shall be interpreted as
indicating that all characters with a coded character set value higher than the
value of the character in the preceding line, and lower than the coded character
set value for the character in the following line, in the current coded
character set, shall be placed in the character collation order between the
previous and the following character in ascending order according to their coded
character set values. An initial ellipsis shall be interpreted as if the
preceding line specified the NUL character, and a trailing ellipsis as if the
following line specified the highest coded character set value in the current
coded character set. An ellipsis shall be treated as invalid if the preceding or
following lines do not specify characters in the current coded character set.
The use of the ellipsis symbol ties the definition to a specific coded character
set and may preclude the definition from being portable between
implementations.


The symbol UNDEFINED shall be interpreted as including all coded
character set values not specified explicitly or via the ellipsis symbol. Such
characters shall be inserted in the character collation order at the point
indicated by the symbol, and in ascending order according to their coded
character set values. If no UNDEFINED symbol is specified, and the
current coded character set contains characters not specified in this section,
the utility shall issue a warning message and place such characters at the end
of the character collation order.


The optional operands for each collation-element shall be used to define the
primary, secondary, or subsequent weights for the collating element. The first
operand specifies the relative primary weight, the second the relative secondary
weight, and so on. Two or more collation-elements can be assigned the same
weight; they belong to the same "equivalence class" if they have the same
primary weight. Collation shall behave as if, for each weight level, elements
subject to IGNORE are removed, unless the position collation
directive is specified for the corresponding level with the order_start
keyword. Then each successive pair of elements shall be compared according to
the relative weights for the elements. If the two strings compare equal, the
process shall be repeated for the next weight level, up to the limit
{COLL_WEIGHTS_MAX}.


Weights shall be expressed as characters (in any of the forms specified in Locale Definition), <collating-symbol>s,
<collating-element>s, an ellipsis, or the special symbol
IGNORE. A single character, a <collating-symbol>, or a
<collating-element> shall represent the relative position in the
character collating sequence of the character or symbol, rather than the
character or characters themselves. Thus, rather than assigning absolute values
to weights, a particular weight is expressed using the relative order value
assigned to a collating element based on its order in the character collation
sequence.


One-to-many mapping is indicated by specifying two or more concatenated
characters or symbolic names. For example, if the <eszet> is given the
string "<s><s>" as a weight, comparisons are performed as
if all occurrences of the <eszet> are replaced by
"<s><s>" (assuming that "<s>" has the
collating weight "<s>" ). If it is necessary to define
<eszet> and "<s><s>" as an equivalence class, then a
collating element must be defined for the string "ss".


All characters specified via an ellipsis shall by default be assigned unique
weights, equal to the relative order of characters. Characters specified via an
explicit or implicit UNDEFINED special symbol shall by default be
assigned the same primary weight (that is, they belong to the same equivalence
class). An ellipsis symbol as a weight shall be interpreted to mean that each
character in the sequence shall have unique weights, equal to the relative order
of their character in the character collation sequence. The use of the ellipsis
as a weight shall be treated as an error if the collating element is neither an
ellipsis nor the special symbol UNDEFINED.


The special keyword IGNORE as a weight shall indicate that when
strings are compared using the weights at the level where IGNORE is
specified, the collating element shall be ignored; that is, as if the string did
not contain the collating element. In regular expressions and pattern matching,
all characters that are subject to IGNORE in their primary weight form an
equivalence class.


An empty operand shall be interpreted as the collating element itself.


For example, the order statement:


<a> <a>;<a>


is equal to:


<a>


An ellipsis can be used as an operand if the collating element was an
ellipsis, and shall be interpreted as the value of each character defined by the
ellipsis.


The collation order as defined in this section affects the interpretation of
bracket expressions in regular expressions (see RE Bracket Expression).


For example:


order_start forward;backward
UNDEFINED IGNORE;IGNORE
<LOW>
<space> <LOW>;<space>
... <LOW>;...
<a> <a>;<a>
<a-acute> <a>;<a-acute>
<a-grave> <a>;<a-grave>
<A> <a>;<A>
<A-acute> <a>;<A-acute>
<A-grave> <a>;<A-grave>
<ch> <ch>;<ch>
<Ch> <ch>;<Ch>
<s> <s>;<s>
<eszet> "<s><s>";"<eszet><eszet>"
order_end


This example is interpreted as follows:



  1. The UNDEFINED means that all characters not specified in this
    definition (explicitly or via the ellipsis) shall be ignored for collation
    purposes.



  2. All characters between <space> and 'a' shall have the same
    primary equivalence class and individual secondary weights based on their
    ordinal encoded values.



  3. All characters based on the uppercase or lowercase character 'a'
    belong to the same primary equivalence class.



  4. The multi-character collating element <ch> is represented by the
    collating symbol <ch> and belongs to the same primary equivalence class as
    the multi-character collating element <Ch>.


The order_end Keyword

The collating order entries shall be terminated with an order_end
keyword.


LC_COLLATE Category in the POSIX Locale

The collation sequence definition of the POSIX locale follows; the code
listing depicts the localedef
input.

LC_COLLATE
# This is the POSIX locale definition for the LC_COLLATE category.
# The order is the same as in the ASCII codeset.
order_start forward
<NUL>
<SOH>
<STX>
<ETX>
<EOT>
<ENQ>
<ACK>
<alert>
<backspace>
<tab>
<newline>
<vertical-tab>
<form-feed>
<carriage-return>
<SO>
<SI>
<DLE>
<DC1>
<DC2>
<DC3>
<DC4>
<NAK>
<SYN>
<ETB>
<CAN>
<EM>
<SUB>
<ESC>
<IS4>
<IS3>
<IS2>
<IS1>
<space>
<exclamation-mark>
<quotation-mark>
<number-sign>
<dollar-sign>
<percent-sign>
<ampersand>
<apostrophe>
<left-parenthesis>
<right-parenthesis>
<asterisk>
<plus-sign>
<comma>
<hyphen>
<period>
<slash>
<zero>
<one>
<two>
<three>
<four>
<five>
<six>
<seven>
<eight>
<nine>
<colon>
<semicolon>
<less-than-sign>
<equals-sign>
<greater-than-sign>
<question-mark>
<commercial-at>
<A>
<B>
<C>
<D>
<E>
<F>
<G>
<H>
<I>
<J>
<K>
<L>
<M>
<N>
<O>
<P>
<Q>
<R>
<S>
<T>
<U>
<V>
<W>
<X>
<Y>
<Z>
<left-square-bracket>
<backslash>
<right-square-bracket>
<circumflex>
<underscore>
<grave-accent>
<a>
<b>
<c>
<d>
<e>
<f>
<g>
<h>
<i>
<j>
<k>
<l>
<m>
<n>
<o>
<p>
<q>
<r>
<s>
<t>
<u>
<v>
<w>
<x>
<y>
<z>
<left-curly-bracket>
<vertical-line>
<right-curly-bracket>
<tilde>
<DEL>
order_end
#
END LC_COLLATE


7.3.3 LC_MONETARY


The LC_MONETARY category shall define the rules and symbols that are
used to format monetary numeric information.


This information is available through the localeconv() function [XSI] [Option Start]  and is used by the strfmon() function. [Option End]


[XSI] [Option Start] Some of the information is
also available in an alternative form via the nl_langinfo() function (see
CRNCYSTR in <langinfo.h>). [Option End]


The following items are defined in this category of the locale. The item
names are the keywords recognized by the localedef utility when defining a
locale. They are also similar to the member names of the lconv structure
defined in <locale.h>; see <locale.h> for the exact symbols in the
header. The localeconv()
function returns {CHAR_MAX} for unspecified integer items and the empty string (
"" ) for unspecified or size zero string items.


In a locale definition file, the operands are strings, formatted as indicated
by the grammar in Locale Definition Grammar. For some
keywords, the strings can contain only integers. Keywords that are not provided,
string values set to the empty string ( "" ), or integer keywords set
to -1, are used to indicate that the value is not available in the locale. The
following keywords shall be recognized:


copy
Specify the name of an existing locale which shall be used as the definition
of this category. If this keyword is specified, no other keyword shall be
specified.
Note:
This is a localedef utility
keyword, unavailable through localeconv().

int_curr_symbol
The international currency symbol. The operand shall be a four-character
string, with the first three characters containing the alphabetic international
currency symbol. The international currency symbol should be chosen in
accordance with those specified in the ISO 4217 standard. The fourth character
shall be the character used to separate the international currency symbol from
the monetary quantity.
currency_symbol
The string that shall be used as the local currency symbol.
mon_decimal_point
The operand is a string containing the symbol that shall be used as the
decimal delimiter (radix character) in monetary formatted quantities.
mon_thousands_sep
The operand is a string containing the symbol that shall be used as a
separator for groups of digits to the left of the decimal delimiter in formatted
monetary quantities.
mon_grouping
Define the size of each group of digits in formatted monetary quantities.
The operand is a sequence of integers separated by semicolons. Each integer
specifies the number of digits in each group, with the initial integer defining
the size of the group immediately preceding the decimal delimiter, and the
following integers defining the preceding groups. If the last integer is not -1,
then the size of the previous group (if any) shall be repeatedly used for the
remainder of the digits. If the last integer is -1, then no further grouping
shall be performed.
positive_sign
A string that shall be used to indicate a non-negative-valued formatted
monetary quantity.
negative_sign
A string that shall be used to indicate a negative-valued formatted monetary
quantity.
int_frac_digits
An integer representing the number of fractional digits (those to the right
of the decimal delimiter) to be written in a formatted monetary quantity using
int_curr_symbol.
frac_digits
An integer representing the number of fractional digits (those to the right
of the decimal delimiter) to be written in a formatted monetary quantity using
currency_symbol.
p_cs_precedes
An integer set to 1 if the currency_symbol precedes the value for a
monetary quantity with a non-negative value, and set to 0 if the symbol succeeds
the value.
p_sep_by_space
Set to a value indicating the separation of the currency_symbol, the
sign string, and the value for a non-negative formatted monetary quantity.

The values of p_sep_by_space, n_sep_by_space,
int_p_sep_by_space, and int_n_sep_by_space are interpreted
according to the following:


0
No space separates the currency symbol and value.
1
If the currency symbol and sign string are adjacent, a space separates them
from the value; otherwise, a space separates the currency symbol from the value.

2
If the currency symbol and sign string are adjacent, a space separates them;
otherwise, a space separates the sign string from the value.
n_cs_precedes
An integer set to 1 if the currency_symbol precedes the value for a
monetary quantity with a negative value, and set to 0 if the symbol succeeds the
value.
n_sep_by_space
Set to a value indicating the separation of the currency_symbol, the
sign string, and the value for a negative formatted monetary quantity.
p_sign_posn
An integer set to a value indicating the positioning of the
positive_sign for a monetary quantity with a non-negative value. The
following integer values shall be recognized for int_n_sign_posn,
int_p_sign_posn, n_sign_posn, and p_sign_posn:
0
Parentheses enclose the quantity and the currency_symbol.
1
The sign string precedes the quantity and the currency_symbol.
2
The sign string succeeds the quantity and the currency_symbol.
3
The sign string precedes the currency_symbol.
4
The sign string succeeds the currency_symbol.
n_sign_posn
An integer set to a value indicating the positioning of the
negative_sign for a negative formatted monetary quantity.
int_p_cs_precedes
An integer set to 1 if the int_curr_symbol precedes the value for a
monetary quantity with a non-negative value, and set to 0 if the symbol succeeds
the value.
int_n_cs_precedes
An integer set to 1 if the int_curr_symbol precedes the value for a
monetary quantity with a negative value, and set to 0 if the symbol succeeds the
value.
int_p_sep_by_space
Set to a value indicating the separation of the int_curr_symbol, the
sign string, and the value for a non-negative internationally formatted monetary
quantity.
int_n_sep_by_space
Set to a value indicating the separation of the int_curr_symbol, the
sign string, and the value for a negative internationally formatted monetary
quantity.
int_p_sign_posn
An integer set to a value indicating the positioning of the
positive_sign for a positive monetary quantity formatted with the
international format.
int_n_sign_posn
An integer set to a value indicating the positioning of the
negative_sign for a negative monetary quantity formatted with the
international format.
LC_MONETARY Category in the POSIX Locale

The monetary formatting definitions for the POSIX locale follow; the code
listing depicting the localedef
input, the table representing the same information with the addition of localeconv() [XSI] [Option Start]  and nl_langinfo() [Option End] formats. All values are
unspecified in the POSIX locale.

LC_MONETARY
# This is the POSIX locale definition for
# the LC_MONETARY category.
#
int_curr_symbol ""
currency_symbol ""
mon_decimal_point ""
mon_thousands_sep ""
mon_grouping -1
positive_sign ""
negative_sign ""
int_frac_digits -1
frac_digits -1
p_cs_precedes -1
p_sep_by_space -1
n_cs_precedes -1
n_sep_by_space -1
p_sign_posn -1
n_sign_posn -1
int_p_cs_precedes -1
int_p_sep_by_space -1
int_n_cs_precedes -1
int_n_sep_by_space -1
int_p_sign_posn -1
int_n_sign_posn -1
#
END LC_MONETARY















































































































































 


langinfo


POSIX Locale


localeconv()


localedef


Item


Constant


Value


Value


Value


int_curr_symbol


-


N/A


""


""


currency_symbol


CRNCYSTR


N/A


""


""


mon_decimal_point


-


N/A


""


""


mon_thousands_sep


-


N/A


""


""


mon_grouping


-


N/A


""


-1


positive_sign


-


N/A


""


""


negative_sign


-


N/A


""


""


int_frac_digits


-


N/A


{CHAR_MAX}


-1


frac_digits


-


N/A


{CHAR_MAX}


-1


p_cs_precedes


CRNCYSTR


N/A


{CHAR_MAX}


-1


p_sep_by_space


-


N/A


{CHAR_MAX}


-1


n_cs_precedes


CRNCYSTR


N/A


{CHAR_MAX}


-1


n_sep_by_space


-


N/A


{CHAR_MAX}


-1


p_sign_posn


-


N/A


{CHAR_MAX}


-1


n_sign_posn


-


N/A


{CHAR_MAX}


-1


int_p_cs_precedes


-


N/A


{CHAR_MAX}


-1


int_p_sep_by_space


-


N/A


{CHAR_MAX}


-1


int_n_cs_precedes


-


N/A


{CHAR_MAX}


-1


int_n_sep_by_space


-


N/A


{CHAR_MAX}


-1


int_p_sign_posn


-


N/A


{CHAR_MAX}


-1


int_n_sign_posn


-


N/A


{CHAR_MAX}


-1


[XSI] [Option Start] In the preceding table, the
langinfo Constant column represents an XSI-conformant extension. [Option End] The entry N/A indicates
that the value is not available in the POSIX locale.


7.3.4 LC_NUMERIC


The LC_NUMERIC category shall define the rules and symbols that are
used to format non-monetary numeric information. This information is available
through the localeconv()
function.


[XSI] [Option Start] Some of the information is
also available in an alternative form via the nl_langinfo() function. [Option End]


The following items are defined in this category of the locale. The item
names are the keywords recognized by the localedef utility when defining a
locale. They are also similar to the member names of the lconv structure
defined in <locale.h>; see <locale.h> for the exact symbols in the
header. The localeconv()
function returns {CHAR_MAX} for unspecified integer items and the empty string (
"" ) for unspecified or size zero string items.


In a locale definition file, the operands are strings, formatted as indicated
by the grammar in Locale Definition Grammar. For some
keywords, the strings can only contain integers. Keywords that are not provided,
string values set to the empty string ( "" ), or integer keywords set
to -1, shall be used to indicate that the value is not available in the locale.
The following keywords shall be recognized:


copy
Specify the name of an existing locale which shall be used as the definition
of this category. If this keyword is specified, no other keyword shall be
specified.
Note:
This is a localedef utility
keyword, unavailable through localeconv().

decimal_point
The operand is a string containing the symbol that shall be used as the
decimal delimiter (radix character) in numeric, non-monetary formatted
quantities. This keyword cannot be omitted and cannot be set to the empty
string. In contexts where standards limit the decimal_point to a single
byte, the result of specifying a multi-byte operand shall be unspecified.
thousands_sep
The operand is a string containing the symbol that shall be used as a
separator for groups of digits to the left of the decimal delimiter in numeric,
non-monetary formatted monetary quantities. In contexts where standards limit
the thousands_sep to a single byte, the result of specifying a multi-byte
operand shall be unspecified.
grouping
Define the size of each group of digits in formatted non-monetary
quantities. The operand is a sequence of integers separated by semicolons. Each
integer specifies the number of digits in each group, with the initial integer
defining the size of the group immediately preceding the decimal delimiter, and
the following integers defining the preceding groups. If the last integer is not
-1, then the size of the previous group (if any) shall be repeatedly used for
the remainder of the digits. If the last integer is -1, then no further grouping
shall be performed.
LC_NUMERIC Category in the POSIX Locale

The non-monetary numeric formatting definitions for the POSIX locale follow;
the code listing depicting the localedef input, the table
representing the same information with the addition of localeconv() values, [XSI] [Option Start]  and nl_langinfo() constants. [Option End]

LC_NUMERIC
# This is the POSIX locale definition for
# the LC_NUMERIC category.
#
decimal_point "<period>"
thousands_sep ""
grouping -1
#
END LC_NUMERIC



































 


langinfo


POSIX Locale


localeconv()


localedef


Item


Constant


Value


Value


Value


decimal_point


RADIXCHAR


"."


"."


.


thousands_sep


THOUSEP


N/A


""


""


grouping


-


N/A


""


-1


[XSI] [Option Start] In the preceding table, the
langinfo Constant column represents an XSI-conforming extension. [Option End] The entry N/A indicates
that the value is not available in the POSIX locale.


7.3.5 LC_TIME


The LC_TIME category shall define the interpretation of the conversion
specifications supported by the date
utility and shall affect the behavior of the strftime(), wcsftime(), strptime(), [XSI] [Option Start]  and nl_langinfo() [Option End]  functions. Since the interfaces
for C-language access and locale definition differ significantly, they are
described separately.


LC_TIME Locale Definition

In a locale definition, the following mandatory keywords shall be
recognized:


copy
Specify the name of an existing locale which shall be used as the definition
of this category. If this keyword is specified, no other keyword shall be
specified.
abday
Define the abbreviated weekday names, corresponding to the %a
conversion specification (conversion specification in the strftime(), wcsftime(), and strptime() functions). The operand
shall consist of seven semicolon-separated strings, each surrounded by
double-quotes. The first string shall be the abbreviated name of the day
corresponding to Sunday, the second the abbreviated name of the day
corresponding to Monday, and so on.
day
Define the full weekday names, corresponding to the %A conversion
specification. The operand shall consist of seven semicolon-separated strings,
each surrounded by double-quotes. The first string is the full name of the day
corresponding to Sunday, the second the full name of the day corresponding to
Monday, and so on.
abmon
Define the abbreviated month names, corresponding to the %b
conversion specification. The operand shall consist of twelve
semicolon-separated strings, each surrounded by double-quotes. The first string
shall be the abbreviated name of the first month of the year (January), the
second the abbreviated name of the second month, and so on.
mon
Define the full month names, corresponding to the %B conversion
specification. The operand shall consist of twelve semicolon-separated strings,
each surrounded by double-quotes. The first string shall be the full name of the
first month of the year (January), the second the full name of the second month,
and so on.
d_t_fmt
Define the appropriate date and time representation, corresponding to the
%c conversion specification. The operand shall consist of a string
containing any combination of characters and conversion specifications. In
addition, the string can contain escape sequences defined in the table in Escape Sequences and Associated
Actions
( '\\', '\a', '\b', '\f',
'\n', '\r', '\t', '\v' ).
d_fmt
Define the appropriate date representation, corresponding to the %x
conversion specification. The operand shall consist of a string containing any
combination of characters and conversion specifications. In addition, the string
can contain escape sequences defined in Escape Sequences and Associated
Actions
.
t_fmt
Define the appropriate time representation, corresponding to the %X
conversion specification. The operand shall consist of a string containing any
combination of characters and conversion specifications. In addition, the string
can contain escape sequences defined in Escape Sequences and Associated
Actions
.
am_pm
Define the appropriate representation of the ante-meridiem and
post-meridiem strings, corresponding to the %p conversion
specification. The operand shall consist of two strings, separated by a
semicolon, each surrounded by double-quotes. The first string shall represent
the ante-meridiem designation, the last string the post-meridiem
designation.
t_fmt_ampm
Define the appropriate time representation in the 12-hour clock format with
am_pm, corresponding to the %r conversion specification. The
operand shall consist of a string and can contain any combination of characters
and conversion specifications. If the string is empty, the 12-hour format is not
supported in the locale.
era
Define how years are counted and displayed for each era in a locale. The
operand shall consist of semicolon-separated strings. Each string shall be an
era description segment with the format:
direction:offset:start_date:end_date:era_name:era_format

according to the definitions below. There can be as many era description
segments as are necessary to describe the different eras.


Note:
The start of an era might not be the earliest point in the era-it may be the
latest. For example, the Christian era BC starts on the day before January 1, AD
1, and increases with earlier time.
direction
Either a '+' or a '-' character. The '+'
character shall indicate that years closer to the start_date have lower
numbers than those closer to the end_date. The '-' character
shall indicate that years closer to the start_date have higher numbers
than those closer to the end_date.
offset
The number of the year closest to the start_date in the era,
corresponding to the %Ey conversion specification.
start_date
A date in the form yyyy/mm/dd, where yyyy,
mm, and dd are the year, month, and day numbers respectively of
the start of the era. Years prior to AD 1 shall be represented as negative
numbers.
end_date
The ending date of the era, in the same format as the start_date, or
one of the two special values "-*" or "+*". The value
"-*" shall indicate that the ending date is the beginning of time. The
value "+*" shall indicate that the ending date is the end of time.
era_name
A string representing the name of the era, corresponding to the %EC
conversion specification.
era_format
A string for formatting the year in the era, corresponding to the
%EY conversion specification.
era_d_fmt
Define the format of the date in alternative era notation, corresponding to
the %Ex conversion specification.
era_t_fmt
Define the locale's appropriate alternative time format, corresponding to
the %EX conversion specification.
era_d_t_fmt
Define the locale's appropriate alternative date and time format,
corresponding to the %Ec conversion specification.
alt_digits
Define alternative symbols for digits, corresponding to the %O
modified conversion specification. The operand shall consist of
semicolon-separated strings, each surrounded by double-quotes. The first string
shall be the alternative symbol corresponding with zero, the second string the
symbol corresponding with one, and so on. Up to 100 alternative symbol strings
can be specified. The %O modifier shall indicate that the string
corresponding to the value specified via the conversion specification shall be
used instead of the value.
LC_TIME C-Language Access

[XSI] [Option Start] This section describes
extensions to access information in the LC_TIME category using the nl_langinfo() function. This
functionality is dependent on support of the XSI extension (and the rest of this
section is not further marked for this option). [Option End]


The following constants used to identify items of langinfo data can be
used as arguments to the nl_langinfo() function to access
information in the LC_TIME category. These constants are defined in the
<langinfo.h> header.


ABDAY_x
The abbreviated weekday names (for example, Sun), where x is a number
from 1 to 7.
DAY_x
The full weekday names (for example, Sunday), where x is a number
from 1 to 7.
ABMON_x
The abbreviated month names (for example, Jan), where x is a number
from 1 to 12.
MON_x
The full month names (for example, January), where x is a number from
1 to 12.
D_T_FMT
The appropriate date and time representation.
D_FMT
The appropriate date representation.
T_FMT
The appropriate time representation.
AM_STR
The appropriate ante-meridiem affix.
PM_STR
The appropriate post-meridiem affix.
T_FMT_AMPM
The appropriate time representation in the 12-hour clock format with AM_STR
and PM_STR.
ERA
The era description segments, which describe how years are counted and
displayed for each era in a locale. Each era description segment shall have the
format:
direction:offset:start_date:end_date:era_name:era_format

according to the definitions below. There can be as many era description
segments as are necessary to describe the different eras. Era description
segments are separated by semicolons.


direction
Either a '+' or a '-' character. The '+'
character shall indicate that years closer to the start_date have lower
numbers than those closer to the end_date. The '-' character
shall indicate that years closer to the start_date have higher numbers
than those closer to the end_date.
offset
The number of the year closest to the start_date in the era.
start_date
A date in the form yyyy/mm/dd, where yyyy,
mm, and dd are the year, month, and day numbers respectively of
the start of the era. Years prior to AD 1 shall be represented as negative
numbers.
end_date
The ending date of the era, in the same format as the start_date, or
one of the two special values "-*" or "+*". The value
"-*" shall indicate that the ending date is the beginning of time. The
value "+*" shall indicate that the ending date is the end of time.
era_name
The era, corresponding to the %EC conversion specification.
era_format
The format of the year in the era, corresponding to the %EY
conversion specification.
ERA_D_FMT
The era date format.
ERA_T_FMT
The locale's appropriate alternative time format, corresponding to the
%EX conversion specification.
ERA_D_T_FMT
The locale's appropriate alternative date and time format, corresponding to
the %Ec conversion specification.
ALT_DIGITS
The alternative symbols for digits, corresponding to the %O
conversion specification modifier. The value consists of semicolon-separated
symbols. The first is the alternative symbol corresponding to zero, the second
is the symbol corresponding to one, and so on. Up to 100 alternative symbols may
be specified.
LC_TIME Category in the POSIX Locale

The LC_TIME category definition of the POSIX locale follows; the code
listing depicts the localedef
input; the table represents the same information with the addition of localedef keywords, conversion
specifiers used by the date utility
and the strftime(), wcsftime(), and strptime() functions, [XSI] [Option Start]  and nl_langinfo() constants. [Option End]

LC_TIME
# This is the POSIX locale definition for
# the LC_TIME category.
#
# Abbreviated weekday names (%a)
abday "<S><u><n>";"<M><o><n>";"<T><u><e>";"<W><e><d>";\
"<T><h><u>";"<F><r><i>";"<S><a><t>"
#
# Full weekday names (%A)
day "<S><u><n><d><a><y>";"<M><o><n><d><a><y>";\
"<T><u><e><s><d><a><y>";"<W><e><d><n><e><s><d><a><y>";\
"<T><h><u><r><s><d><a><y>";"<F><r><i><d><a><y>";\
"<S><a><t><u><r><d><a><y>"
#
# Abbreviated month names (%b)
abmon "<J><a><n>";"<F><e><b>";"<M><a><r>";\
"<A><p><r>";"<M><a><y>";"<J><u><n>";\
"<J><u><l>";"<A><u><g>";"<S><e><p>";\
"<O><c><t>";"<N><o><v>";"<D><e><c>"
#
# Full month names (%B)
mon "<J><a><n><u><a><r><y>";"<F><e><b><r><u><a><r><y>";\
"<M><a><r><c><h>";"<A><p><r><i><l>";\
"<M><a><y>";"<J><u><n><e>";\
"<J><u><l><y>";"<A><u><g><u><s><t>";\
"<S><e><p><t><e><m><b><e><r>";"<O><c><t><o><b><e><r>";\
"<N><o><v><e><m><b><e><r>";"<D><e><c><e><m><b><e><r>"
#
# Equivalent of AM/PM (%p) "AM";"PM"
am_pm "<A><M>";"<P><M>"
#
# Appropriate date and time representation (%c)
# "%a %b %e %H:%M:%S %Y"
d_t_fmt "<percent-sign><a><space><percent-sign><b>\
<space><percent-sign><e><space><percent-sign><H>\
<colon><percent-sign><M><colon><percent-sign><S>\
<space><percent-sign><Y>"
#
# Appropriate date representation (%x) "%m/%d/%y"
d_fmt "<percent-sign><m><slash><percent-sign><d>\
<slash><percent-sign><y>"
#
# Appropriate time representation (%X) "%H:%M:%S"
t_fmt "<percent-sign><H><colon><percent-sign><M>\
<colon><percent-sign><S>"
#
# Appropriate 12-hour time representation (%r) "%I:%M:%S %p"
t_fmt_ampm "<percent-sign><I><colon><percent-sign><M><colon>\
<percent-sign><S><space><percent_sign><p>"
#
END LC_TIME




































































































































































































































































localedef


langinfo


Conversion


POSIX


Keyword


Constant


Specification


Locale Value


d_t_fmt


D_T_FMT


%c


"%a %b %e %H:%M:%S %Y"


d_fmt


D_FMT


%x


"%m/%d/%y"


t_fmt


T_FMT


%X


"%H:%M:%S"


am_pm


AM_STR


%p


"AM"


am_pm


PM_STR


%p


"PM"


t_fmt_ampm


T_FMT_AMPM


%r


"%I:%M:%S %p"


day


DAY_1


%A


"Sunday"


day


DAY_2


%A


"Monday"


day


DAY_3


%A


"Tuesday"


day


DAY_4


%A


"Wednesday"


day


DAY_5


%A


"Thursday"


day


DAY_6


%A


"Friday"


day


DAY_7


%A


"Saturday"


abday


ABDAY_1


%a


"Sun"


abday


ABDAY_2


%a


"Mon"


abday


ABDAY_3


%a


"Tue"


abday


ABDAY_4


%a


"Wed"


abday


ABDAY_5


%a


"Thu"


abday


ABDAY_6


%a


"Fri"


abday


ABDAY_7


%a


"Sat"


mon


MON_1


%B


"January"


mon


MON_2


%B


"February"


mon


MON_3


%B


"March"


mon


MON_4


%B


"April"


mon


MON_5


%B


"May"


mon


MON_6


%B


"June"


mon


MON_7


%B


"July"


mon


MON_8


%B


"August"


mon


MON_9


%B


"September"


mon


MON_10


%B


"October"


mon


MON_11


%B


"November"


mon


MON_12


%B


"December"


abmon


ABMON_1


%b


"Jan"


abmon


ABMON_2


%b


"Feb"


abmon


ABMON_3


%b


"Mar"


abmon


ABMON_4


%b


"Apr"


abmon


ABMON_5


%b


"May"


abmon


ABMON_6


%b


"Jun"


abmon


ABMON_7


%b


"Jul"


abmon


ABMON_8


%b


"Aug"


abmon


ABMON_9


%b


"Sep"


abmon


ABMON_10


%b


"Oct"


abmon


ABMON_11


%b


"Nov"


abmon


ABMON_12


%b


"Dec"


era


ERA


%EC, %Ey, %EY


N/A


era_d_fmt


ERA_D_FMT


%Ex


N/A


era_t_fmt


ERA_T_FMT


%EX


N/A


era_d_t_fmt


ERA_D_T_FMT


%Ec


N/A


alt_digits


ALT_DIGITS


%O


N/A


[XSI] [Option Start] In the preceding table, the
langinfo Constant column represents an XSI-conformant extension. [Option End]


The entry N/A indicates the value is not available in the POSIX locale.


7.3.6 LC_MESSAGES


The LC_MESSAGES category shall define the format and values used by
various utilities for affirmative and negative responses. [XSI] [Option Start] This information is
available through the nl_langinfo() function. [Option End]


[XSI] [Option Start] The message catalog used by
the standard utilities and selected by the catopen() function shall be
determined by the setting of NLSPATH ; see Environment Variables. The
LC_MESSAGES category can be specified as part of an NLSPATH
substitution field. [Option End]


The following keywords shall be recognized as part of the locale definition
file.


copy
Specify the name of an existing locale which shall be used as the definition
of this category. If this keyword is specified, no other keyword shall be
specified.
Note:
This is a localedef
keyword, unavailable through nl_langinfo().

yesexpr
The operand consists of an extended regular expression (see Extended Regular Expressions) that
describes the acceptable affirmative response to a question expecting an
affirmative or negative response.
noexpr
The operand consists of an extended regular expression that describes the
acceptable negative response to a question expecting an affirmative or negative
response.
LC_MESSAGES Category in the POSIX Locale

The format and values for affirmative and negative responses of the POSIX
locale follow; the code listing depicting the localedef input, the table
representing the same information [XSI] [Option Start]  with the addition of nl_langinfo() constants. [Option End]

LC_MESSAGES
# This is the POSIX locale definition for
# the LC_MESSAGES category.
#
yesexpr "<circumflex><left-square-bracket><y><Y><right-square-bracket>"
#
noexpr "<circumflex><left-square-bracket><n><N><right-square-bracket>"
#
END LC_MESSAGES

















localedef Keyword


langinfo Constant


POSIX Locale Value


yesexpr


YESEXPR


"^[yY]"


noexpr


NOEXPR


"^[nN]"


[XSI] [Option Start] In the preceding table, the
langinfo Constant column represents an XSI-conformant extension. [Option End]


7.4 Locale Definition Grammar


The grammar and lexical conventions in this section shall together describe
the syntax for the locale definition source. The general conventions for this
style of grammar are described in the Shell and Utilities volume of
IEEE Std 1003.1-2001, Section
1.10, Grammar Conventions
. The grammar shall take precedence over the text
in this chapter.


7.4.1 Locale Lexical Conventions


The lexical conventions for the locale definition grammar are described in
this section.


The following tokens shall be processed (in addition to those string
constants shown in the grammar):


LOC_NAME
A string of characters representing the name of a locale.
CHAR
Any single character.
NUMBER
A decimal number, represented by one or more decimal digits.
COLLSYMBOL
A symbolic name, enclosed between angle brackets. The string cannot
duplicate any charmap symbol defined in the current charmap (if any), or a
COLLELEMENT symbol.
COLLELEMENT
A symbolic name, enclosed between angle brackets, which cannot duplicate
either any charmap symbol or a COLLSYMBOL symbol.
CHARCLASS
A string of alphanumeric characters from the portable character set, the
first of which is not a digit, consisting of at least one and at most
{CHARCLASS_NAME_MAX} bytes, and optionally surrounded by double-quotes.
CHARSYMBOL
A symbolic name, enclosed between angle brackets, from the current charmap
(if any).
OCTAL_CHAR
One or more octal representations of the encoding of each byte in a single
character. The octal representation consists of an escape character (normally a
backslash) followed by two or more octal digits.
HEX_CHAR
One or more hexadecimal representations of the encoding of each byte in a
single character. The hexadecimal representation consists of an escape character
followed by the constant x and two or more hexadecimal digits.
DECIMAL_CHAR
One or more decimal representations of the encoding of each byte in a single
character. The decimal representation consists of an escape character followed
by a character 'd' and two or more decimal digits.
ELLIPSIS
The string "...".
EXTENDED_REG_EXP
An extended regular expression as defined in the grammar in Regular Expression Grammar.
EOL
The line termination character <newline>.

7.4.2 Locale Grammar


This section presents the grammar for the locale definition.

%token LOC_NAME
%token CHAR
%token NUMBER
%token COLLSYMBOL COLLELEMENT
%token CHARSYMBOL OCTAL_CHAR HEX_CHAR DECIMAL_CHAR
%token ELLIPSIS
%token EXTENDED_REG_EXP
%token EOL


%start locale_definition


%%


locale_definition : global_statements locale_categories
| locale_categories
;


global_statements : global_statements symbol_redefine
| symbol_redefine
;


symbol_redefine : 'escape_char' CHAR EOL
| 'comment_char' CHAR EOL
;


locale_categories : locale_categories locale_category
| locale_category
;


locale_category : lc_ctype | lc_collate | lc_messages
| lc_monetary | lc_numeric | lc_time
;


/* The following grammar rules are common to all categories */


char_list : char_list char_symbol
| char_symbol
;


char_symbol : CHAR | CHARSYMBOL
| OCTAL_CHAR | HEX_CHAR | DECIMAL_CHAR
;


elem_list : elem_list char_symbol
| elem_list COLLSYMBOL
| elem_list COLLELEMENT
| char_symbol
| COLLSYMBOL
| COLLELEMENT
;


symb_list : symb_list COLLSYMBOL
| COLLSYMBOL
;


locale_name : LOC_NAME
| '"' LOC_NAME '"'
;


/* The following is the LC_CTYPE category grammar */


lc_ctype : ctype_hdr ctype_keywords ctype_tlr
| ctype_hdr 'copy' locale_name EOL ctype_tlr
;


ctype_hdr : 'LC_CTYPE' EOL
;


ctype_keywords : ctype_keywords ctype_keyword
| ctype_keyword
;


ctype_keyword : charclass_keyword charclass_list EOL
| charconv_keyword charconv_list EOL
| 'charclass' charclass_namelist EOL
;


charclass_namelist : charclass_namelist ';' CHARCLASS
| CHARCLASS
;


charclass_keyword : 'upper' | 'lower' | 'alpha' | 'digit'
| 'punct' | 'xdigit' | 'space' | 'print'
| 'graph' | 'blank' | 'cntrl' | 'alnum'
| CHARCLASS
;


charclass_list : charclass_list ';' char_symbol
| charclass_list ';' ELLIPSIS ';' char_symbol
| char_symbol
;


charconv_keyword : 'toupper'
| 'tolower'
;


charconv_list : charconv_list ';' charconv_entry
| charconv_entry
;


charconv_entry : '(' char_symbol ',' char_symbol ')'
;


ctype_tlr : 'END' 'LC_CTYPE' EOL
;


/* The following is the LC_COLLATE category grammar */


lc_collate : collate_hdr collate_keywords collate_tlr
| collate_hdr 'copy' locale_name EOL collate_tlr
;


collate_hdr : 'LC_COLLATE' EOL
;


collate_keywords : order_statements
| opt_statements order_statements
;


opt_statements : opt_statements collating_symbols
| opt_statements collating_elements
| collating_symbols
| collating_elements
;


collating_symbols : 'collating-symbol' COLLSYMBOL EOL
;


collating_elements : 'collating-element' COLLELEMENT
| 'from' '"' elem_list '"' EOL
;


order_statements : order_start collation_order order_end
;


order_start : 'order_start' EOL
| 'order_start' order_opts EOL
;


order_opts : order_opts ';' order_opt
| order_opt
;


order_opt : order_opt ',' opt_word
| opt_word
;


opt_word : 'forward' | 'backward' | 'position'
;


collation_order : collation_order collation_entry
| collation_entry
;


collation_entry : COLLSYMBOL EOL
| collation_element weight_list EOL
| collation_element EOL
;


collation_element : char_symbol
| COLLELEMENT
| ELLIPSIS
| 'UNDEFINED'
;


weight_list : weight_list ';' weight_symbol
| weight_list ';'
| weight_symbol
;


weight_symbol : /* empty */
| char_symbol
| COLLSYMBOL
| '"' elem_list '"'
| '"' symb_list '"'
| ELLIPSIS
| 'IGNORE'
;


order_end : 'order_end' EOL
;


collate_tlr : 'END' 'LC_COLLATE' EOL
;


/* The following is the LC_MESSAGES category grammar */


lc_messages : messages_hdr messages_keywords messages_tlr
| messages_hdr 'copy' locale_name EOL messages_tlr
;


messages_hdr : 'LC_MESSAGES' EOL
;


messages_keywords : messages_keywords messages_keyword
| messages_keyword
;


messages_keyword : 'yesexpr' '"' EXTENDED_REG_EXP '"' EOL
| 'noexpr' '"' EXTENDED_REG_EXP '"' EOL
;


messages_tlr : 'END' 'LC_MESSAGES' EOL
;


/* The following is the LC_MONETARY category grammar */


lc_monetary : monetary_hdr monetary_keywords monetary_tlr
| monetary_hdr 'copy' locale_name EOL monetary_tlr
;


monetary_hdr : 'LC_MONETARY' EOL
;


monetary_keywords : monetary_keywords monetary_keyword
| monetary_keyword
;


monetary_keyword : mon_keyword_string mon_string EOL
| mon_keyword_char NUMBER EOL
| mon_keyword_char '-1' EOL
| mon_keyword_grouping mon_group_list EOL
;


mon_keyword_string : 'int_curr_symbol' | 'currency_symbol'
| 'mon_decimal_point' | 'mon_thousands_sep'
| 'positive_sign' | 'negative_sign'
;


mon_string : '"' char_list '"'
| '""'
;


mon_keyword_char : 'int_frac_digits' | 'frac_digits'
| 'p_cs_precedes' | 'p_sep_by_space'
| 'n_cs_precedes' | 'n_sep_by_space'
| 'p_sign_posn' | 'n_sign_posn'
| 'int_p_cs_precedes' | 'int_p_sep_by_space'
| 'int_n_cs_precedes' | 'int_n_sep_by_space'
| 'int_p_sign_posn' | 'int_n_sign_posn'
;


mon_keyword_grouping : 'mon_grouping'
;


mon_group_list : NUMBER
| mon_group_list ';' NUMBER
;


monetary_tlr : 'END' 'LC_MONETARY' EOL
;


/* The following is the LC_NUMERIC category grammar */


lc_numeric : numeric_hdr numeric_keywords numeric_tlr
| numeric_hdr 'copy' locale_name EOL numeric_tlr
;


numeric_hdr : 'LC_NUMERIC' EOL
;


numeric_keywords : numeric_keywords numeric_keyword
| numeric_keyword
;


numeric_keyword : num_keyword_string num_string EOL
| num_keyword_grouping num_group_list EOL
;


num_keyword_string : 'decimal_point'
| 'thousands_sep'
;


num_string : '"' char_list '"'
| '""'
;


num_keyword_grouping: 'grouping'
;


num_group_list : NUMBER
| num_group_list ';' NUMBER
;


numeric_tlr : 'END' 'LC_NUMERIC' EOL
;


/* The following is the LC_TIME category grammar */


lc_time : time_hdr time_keywords time_tlr
| time_hdr 'copy' locale_name EOL time_tlr
;


time_hdr : 'LC_TIME' EOL
;


time_keywords : time_keywords time_keyword
| time_keyword
;


time_keyword : time_keyword_name time_list EOL
| time_keyword_fmt time_string EOL
| time_keyword_opt time_list EOL
;


time_keyword_name : 'abday' | 'day' | 'abmon' | 'mon'
;


time_keyword_fmt : 'd_t_fmt' | 'd_fmt' | 't_fmt'
| 'am_pm' | 't_fmt_ampm'
;


time_keyword_opt : 'era' | 'era_d_fmt' | 'era_t_fmt'
| 'era_d_t_fmt' | 'alt_digits'
;


time_list : time_list ';' time_string
| time_string
;


time_string : '"' char_list '"'
;


time_tlr : 'END' 'LC_TIME' EOL
;





UNIX ® is a registered Trademark of The
Open Group.
POSIX ® is a registered Trademark of The IEEE.
[ Main Index | XBD | XCU | XSH | XRAT ]










Some say he’s half man half fish, others say he’s more of a seventy/thirty split. Either way he’s a fishy bastard.