compare.txt | compare.txt | |||
---|---|---|---|---|
Network Working Group C. Newman | Network Working Group C. Newman | |||
Internet-Draft Sun Microsystems | Internet-Draft Sun Microsystems | |||
Expires: April 26, 2004 October 27, 2003 | Expires: February 2, 2007 M. Duerst | |||
AGU | ||||
A. Gulbrandsen | ||||
Oryx | ||||
August 1, 2006 | ||||
Internet Application Protocol Collation Registry | Internet Application Protocol Collation Registry | |||
draft-newman-i18n-comparator-01.txt | draft-newman-i18n-comparator-13.txt | |||
Status of this Memo | Status of this Memo | |||
This document is an Internet-Draft and is in full conformance with | By submitting this Internet-Draft, each author represents that any | |||
all provisions of Section 10 of RFC2026. | applicable patent or other IPR claims of which he or she is aware | |||
have been or will be disclosed, and any of which he or she becomes | ||||
aware will be disclosed, in accordance with Section 6 of BCP 79. | ||||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF), its areas, and its working groups. Note that other | Task Force (IETF), its areas, and its working groups. Note that | |||
groups may also distribute working documents as Internet-Drafts. | other groups may also distribute working documents as Internet- | |||
Drafts. | ||||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
The list of current Internet-Drafts can be accessed at http:// | The list of current Internet-Drafts can be accessed at | |||
www.ietf.org/ietf/1id-abstracts.txt. | http://www.ietf.org/ietf/1id-abstracts.txt. | |||
The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
http://www.ietf.org/shadow.html. | http://www.ietf.org/shadow.html. | |||
This Internet-Draft will expire on April 26, 2004. | This Internet-Draft will expire on February 2, 2007. | |||
Copyright Notice | Copyright Notice | |||
Copyright (C) The Internet Society (2003). All Rights Reserved. | Copyright (C) The Internet Society (2006). | |||
Abstract | Abstract | |||
Many Internet application protocols include string-based lookup, | Many Internet application protocols include string-based lookup, | |||
searching, or sorting operations. However the problem space for | searching, or sorting operations. However the problem space for | |||
searching and sorting international strings is large, not fully | searching and sorting international strings is large, not fully | |||
explored, and is outside the area of expertise for the Internet | explored, and is outside the area of expertise for the Internet | |||
Engineering Task Force (IETF). Rather than attempt to solve such a | Engineering Task Force (IETF). Rather than attempt to solve such a | |||
large problem, this specification creates an abstraction framework so | large problem, this specification creates an abstraction framework so | |||
that application protocols can precisely identify a comparison | that application protocols can precisely identify a comparison | |||
function and the repertoire of comparison functions can be extended | function and the repertoire of comparison functions can be extended | |||
in the future. | in the future. | |||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
1.1 Conventions Used in this Document . . . . . . . . . . . . . 3 | 1.1. Conventions Used in this Document . . . . . . . . . . . . 4 | |||
2. Collation Definition and Purpose . . . . . . . . . . . . . . 3 | 2. Collation Definition and Purpose . . . . . . . . . . . . . . . 4 | |||
3. Collation Name Syntax . . . . . . . . . . . . . . . . . . . 4 | 2.1. Definition . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
4. Collation Specification Requirements . . . . . . . . . . . . 6 | 2.2. Purpose . . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
5. Application Protocol Requirements . . . . . . . . . . . . . 8 | 2.3. Some Other Terms Used in this Document . . . . . . . . . 5 | |||
6. Initial Collations . . . . . . . . . . . . . . . . . . . . . 9 | 2.4. Sort Keys . . . . . . . . . . . . . . . . . . . . . . . . 5 | |||
6.1 Octet Collation . . . . . . . . . . . . . . . . . . . . . . 9 | 3. Collation Identifier Syntax . . . . . . . . . . . . . . . . . 6 | |||
6.2 ASCII Numeric Collation . . . . . . . . . . . . . . . . . . 10 | 3.1. Basic Syntax . . . . . . . . . . . . . . . . . . . . . . 6 | |||
6.3 ASCII Casemap Collation . . . . . . . . . . . . . . . . . . 10 | 3.2. Wildcards . . . . . . . . . . . . . . . . . . . . . . . . 6 | |||
6.4 Nameprep Collation . . . . . . . . . . . . . . . . . . . . . 11 | 3.3. Ordering Direction . . . . . . . . . . . . . . . . . . . 6 | |||
6.5 Basic Collation . . . . . . . . . . . . . . . . . . . . . . 12 | 3.4. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 7 | |||
7. Use by ACAP and Sieve . . . . . . . . . . . . . . . . . . . 14 | 3.5. Naming Guidelines . . . . . . . . . . . . . . . . . . . . 7 | |||
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . 14 | 4. Collation Specification Requirements . . . . . . . . . . . . . 8 | |||
8.1 Collation Registration Procedure . . . . . . . . . . . . . . 14 | 4.1. Collation/Server Interface . . . . . . . . . . . . . . . 8 | |||
8.2 Collation Registration Template . . . . . . . . . . . . . . 15 | 4.2. Operations Supported . . . . . . . . . . . . . . . . . . 8 | |||
8.3 Octet Collation Registration . . . . . . . . . . . . . . . . 16 | 4.2.1. Validity . . . . . . . . . . . . . . . . . . . . . . . 8 | |||
8.4 ASCII Numeric Collation Registration . . . . . . . . . . . . 16 | 4.2.2. Equality . . . . . . . . . . . . . . . . . . . . . . . 9 | |||
8.5 Legacy English Casemap Collation Registration . . . . . . . 16 | 4.2.3. Substring . . . . . . . . . . . . . . . . . . . . . . 9 | |||
8.6 English Casemap Collation Registration . . . . . . . . . . . 16 | 4.2.4. Ordering . . . . . . . . . . . . . . . . . . . . . . . 10 | |||
8.7 Nameprep Collation Registration . . . . . . . . . . . . . . 17 | 4.3. Sort Keys . . . . . . . . . . . . . . . . . . . . . . . . 10 | |||
8.8 Basic Collation Registration . . . . . . . . . . . . . . . . 17 | 4.4. Use of Lookup Tables . . . . . . . . . . . . . . . . . . 10 | |||
8.9 Basic Accent Sensitive Match Collation Registration . . . . 17 | 5. Application Protocol Requirements . . . . . . . . . . . . . . 11 | |||
8.10 Basic Case Sensitive Match Collation Registration . . . . . 18 | 5.1. Character Encoding . . . . . . . . . . . . . . . . . . . 11 | |||
8.11 Structure of Collation Registry . . . . . . . . . . . . . . 18 | 5.2. Operations . . . . . . . . . . . . . . . . . . . . . . . 11 | |||
8.12 Example Initial Registry Summary . . . . . . . . . . . . . . 19 | 5.3. Wildcards . . . . . . . . . . . . . . . . . . . . . . . . 12 | |||
9. DTD for Collation Registration . . . . . . . . . . . . . . . 19 | 5.4. Canonicalization Function . . . . . . . . . . . . . . . . 12 | |||
10. Guidelines for Expert Reviewer . . . . . . . . . . . . . . . 20 | 5.5. Disconnected Clients . . . . . . . . . . . . . . . . . . 12 | |||
11. Security Considerations . . . . . . . . . . . . . . . . . . 21 | 5.6. Error Codes . . . . . . . . . . . . . . . . . . . . . . . 12 | |||
12. Open Issues . . . . . . . . . . . . . . . . . . . . . . . . 21 | 5.7. Octet Collation . . . . . . . . . . . . . . . . . . . . . 13 | |||
13. Changes From -00 . . . . . . . . . . . . . . . . . . . . . . 22 | 6. Use by Existing Protocols . . . . . . . . . . . . . . . . . . 13 | |||
Normative References . . . . . . . . . . . . . . . . . . . . 22 | 7. Collation Registration . . . . . . . . . . . . . . . . . . . . 13 | |||
Informative References . . . . . . . . . . . . . . . . . . . 23 | 7.1. Collation Registration Procedure . . . . . . . . . . . . 13 | |||
Author's Address . . . . . . . . . . . . . . . . . . . . . . 24 | 7.2. Collation Registration Format . . . . . . . . . . . . . . 14 | |||
Intellectual Property and Copyright Statements . . . . . . . 25 | 7.2.1. Registration Template . . . . . . . . . . . . . . . . 14 | |||
7.2.2. The collation Element . . . . . . . . . . . . . . . . 15 | ||||
7.2.3. The identifier Element . . . . . . . . . . . . . . . . 15 | ||||
7.2.4. The title Element . . . . . . . . . . . . . . . . . . 15 | ||||
7.2.5. The operations Element . . . . . . . . . . . . . . . . 15 | ||||
7.2.6. The specification Element . . . . . . . . . . . . . . 15 | ||||
7.2.7. The submitter Element . . . . . . . . . . . . . . . . 16 | ||||
7.2.8. The owner Element . . . . . . . . . . . . . . . . . . 16 | ||||
7.2.9. The version Element . . . . . . . . . . . . . . . . . 16 | ||||
7.2.10. The variable Element . . . . . . . . . . . . . . . . . 16 | ||||
7.2.11. The name Element . . . . . . . . . . . . . . . . . . . 16 | ||||
7.2.12. The default Element . . . . . . . . . . . . . . . . . 16 | ||||
7.2.13. The value Element . . . . . . . . . . . . . . . . . . 17 | ||||
7.3. Structure of Collation Registry . . . . . . . . . . . . . 17 | ||||
7.4. Example Initial Registry Summary . . . . . . . . . . . . 18 | ||||
8. Guidelines for Expert Reviewer . . . . . . . . . . . . . . . . 18 | ||||
9. Initial Collations . . . . . . . . . . . . . . . . . . . . . . 19 | ||||
9.1. ASCII Numeric Collation . . . . . . . . . . . . . . . . . 19 | ||||
9.1.1. ASCII Numeric Collation Description . . . . . . . . . 19 | ||||
9.1.2. ASCII Numeric Collation Registration . . . . . . . . . 20 | ||||
9.2. ASCII Casemap Collation . . . . . . . . . . . . . . . . . 20 | ||||
9.2.1. ASCII Casemap Collation Description . . . . . . . . . 20 | ||||
9.2.2. ASCII Casemap Collation Registration . . . . . . . . . 21 | ||||
9.3. Nameprep Collation . . . . . . . . . . . . . . . . . . . 21 | ||||
9.3.1. Nameprep Collation Description . . . . . . . . . . . . 21 | ||||
9.3.2. Nameprep Collation Registration . . . . . . . . . . . 22 | ||||
9.4. Octet Collation . . . . . . . . . . . . . . . . . . . . . 22 | ||||
9.4.1. Octet Collation Description . . . . . . . . . . . . . 22 | ||||
9.4.2. Octet Collation Registration . . . . . . . . . . . . . 23 | ||||
10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 23 | ||||
11. Security Considerations . . . . . . . . . . . . . . . . . . . 23 | ||||
12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 23 | ||||
13. Open Issues . . . . . . . . . . . . . . . . . . . . . . . . . 23 | ||||
14. Change Log . . . . . . . . . . . . . . . . . . . . . . . . . . 24 | ||||
14.1. Changes From -12 . . . . . . . . . . . . . . . . . . . . 24 | ||||
14.2. Changes From -11 . . . . . . . . . . . . . . . . . . . . 24 | ||||
14.3. Changes From -10 . . . . . . . . . . . . . . . . . . . . 24 | ||||
14.4. Changes From -09 . . . . . . . . . . . . . . . . . . . . 24 | ||||
14.5. Changes From -08 . . . . . . . . . . . . . . . . . . . . 25 | ||||
14.6. Changes From -06 . . . . . . . . . . . . . . . . . . . . 26 | ||||
14.7. Changes From -05 . . . . . . . . . . . . . . . . . . . . 26 | ||||
14.8. Changes From -04 . . . . . . . . . . . . . . . . . . . . 26 | ||||
14.9. Changes From -03 . . . . . . . . . . . . . . . . . . . . 26 | ||||
14.10. Changes From -02 . . . . . . . . . . . . . . . . . . . . 27 | ||||
14.11. Changes From -01 . . . . . . . . . . . . . . . . . . . . 27 | ||||
14.12. Changes From -00 . . . . . . . . . . . . . . . . . . . . 27 | ||||
15. References . . . . . . . . . . . . . . . . . . . . . . . . . . 28 | ||||
15.1. Normative References . . . . . . . . . . . . . . . . . . 28 | ||||
15.2. Informative References . . . . . . . . . . . . . . . . . 28 | ||||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 30 | ||||
Intellectual Property and Copyright Statements . . . . . . . . . . 31 | ||||
1. Introduction | 1. Introduction | |||
The ACAP [11] specification introduced the concept of a comparator | The ACAP [12] specification introduced the concept of a comparator | |||
(which we call collation in this document), but failed to create an | (which we call collation in this document), but failed to create an | |||
IANA registry. With the introduction of stringprep [6] and the | IANA registry. With the introduction of stringprep [6] and the | |||
Unicode Collation Algorithm [8], it is now time to create that | Unicode Collation Algorithm [8], it is now time to create that | |||
registry and populate it with some initial values appropriate for an | registry and populate it with some initial values appropriate for an | |||
international community. This specification replaces and generalizes | international community. This specification replaces and generalizes | |||
the definition of a comparator in ACAP and creates a collation | the definition of a comparator in ACAP and creates a collation | |||
registry. | registry. | |||
1.1 Conventions Used in this Document | 1.1. Conventions Used in this Document | |||
The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", and "MAY" | The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", and "MAY" | |||
in this document are to be interpreted as defined in "Key words for | in this document are to be interpreted as defined in "Key words for | |||
use in RFCs to Indicate Requirement Levels" [1]. | use in RFCs to Indicate Requirement Levels" [1]. | |||
The attribute syntax specifications use the Augmented Backus-Naur | The attribute syntax specifications use the Augmented Backus-Naur | |||
Form (ABNF) [2] notation including the core rules defined in Appendix | Form (ABNF) [2] notation including the core rules defined in Appendix | |||
A. This also inherits ABNF rules from Language Tags [5]. | A. This also inherits ABNF rules from Language Tags [5]. | |||
2. Collation Definition and Purpose | ||||
2. Collation Definition and Purpose | 2.1. Definition | |||
A collation is a named function which takes two arbitrary length | A collation is a named function which takes two arbitrary length | |||
octet strings (encoded in UTF-8 [3] for collations which operate on | strings as input and can be used to perform one or more of three | |||
characters) as input and can be used to perform one or more of three | basic comparison operations: equality test, substring match, and | |||
basic comparison operations: equality test, substring match and | ||||
ordering test. | ordering test. | |||
Collations provide a multi-protocol abstraction layer for comparison | 2.2. Purpose | |||
functions so the details of a particular comparison operation can be | ||||
specified by someone with appropriate expertise independent of the | ||||
application protocol that consumes that collation. This is similar | ||||
to the way a charset [14] separates the details of octet to character | ||||
mapping from a protocol specification such as MIME [9] or the way | ||||
SASL [10] separates the details of an authentication mechanism from a | ||||
protocol specification such as ACAP [11]. | ||||
Here a small diagram to help illustrate the value of this abstraction | Collations abstraction layer for comparison functions so that these | |||
layer: | comparison functions can be used in multiple protocols. The details | |||
of a particular comparison operation can be specified by someone with | ||||
+-----------------+ | appropriate expertise independent of the application protocols that | |||
| Octet | | use that collation. This is similar to the way a charset [14] | |||
+-------------------+ +--| Collation Spec | | separates the details of octet to character mapping from a protocol | |||
| IMAP i18n SEARCH |--+ | +-----------------+ | specification such as MIME [10] or the way SASL [11] separates the | |||
details of an authentication mechanism from a protocol specification | ||||
such as ACAP [12]. | ||||
Here is a small diagram to help illustrate the value of this | ||||
abstraction layer: | ||||
+-------------------+ +-----------------+ | ||||
| IMAP i18n SEARCH |--+ | Basic | | ||||
+-------------------+ | +--| Collation Spec | | ||||
| | +-----------------+ | ||||
+-------------------+ | +-------------+ | +-----------------+ | +-------------------+ | +-------------+ | +-----------------+ | |||
+--| Collation |--+--| A stringprep | | | ACAP i18n SEARCH |--+--| Collation |--+--| A stringprep | | |||
+-------------------+ | | Registry | | | Collation Spec | | +-------------------+ | | Registry | | | Collation Spec | | |||
| ACAP i18n SEARCH |--+ +-------------+ | +-----------------+ | | +-------------+ | +-----------------+ | |||
+-------------------+ | +-----------------+ | +-------------------+ | | +-----------------+ | |||
| | locale-specific | | | ...other protocol |--+ | | locale-specific | | |||
+--| Collation Spec | | +-------------------+ +--| Collation Spec | | |||
+-----------------+ | +-----------------+ | |||
Thus IMAP, ACAP and future application protocols with international | Thus IMAP, ACAP and future application protocols with international | |||
search capability simply specify how to interface to the collation | search capability simply specify how to interface to the collation | |||
registry instead of each protocol spec having to specify all the | registry instead of each protocol specification having to specify all | |||
collations it supports. | the collations it supports. | |||
2.3. Some Other Terms Used in this Document | ||||
The terms client, server and protocol are used in somewhat unusual | ||||
senses. | ||||
Client means a user, or a program acting directly on behalf of a | ||||
user. This may be an mail reader acting as an IMAP client, or it may | ||||
be an interactive shell where the user can type protocol directly, or | ||||
it may be a script or program written by the user. | ||||
Server means a program that performs services requested by the | ||||
client. This may be a traditional server such as an HTTP server, or | ||||
it may be a Sieve [15] interpreter running a Sieve script written by | ||||
a user. A server needs to use the operations provided by collations | ||||
in order to fulfil the client's requests. | ||||
The protocol describes how the client tells the server what it wants | ||||
done, and (if applicable) how the server tells the client about the | ||||
results. IMAP is a protocol by this definition, and so is the Sieve | ||||
language. | ||||
2.4. Sort Keys | ||||
One component of a collation is a transformation which turns a string | ||||
into a sort key, which is then used while sorting. | ||||
The transformation can range from an identity mapping (e.g., the | ||||
i;octet collation Section 9.4) to a mapping which makes the string | ||||
unreadable to a human. | ||||
One component of a collation is a canonicalization function which can | This is an implementation detail of collations or servers. A | |||
be pre-applied to single strings and may enhance the performance of | protocol SHOULD NOT expose it, since some collations leave the sort | |||
subsequent comparison operations. Normally, this is an | key's format up to the implementation, and current conformant | |||
implementation detail of collations, but at times it may be useful | implementations are known to use different formats. | |||
for an application protocol to expose collation canonicalization over | ||||
protocol. Collation canonicalization can range from an identity | 3. Collation Identifier Syntax | |||
mapping (e.g., the i;octet collation) to a mapping which makes the | ||||
string unreadable to a human (e.g., the basic collation). | 3.1. Basic Syntax | |||
3. Collation Name Syntax | The collation identifier itself is a single US-ASCII string beginning | |||
with a letter and made up of letters, digits, and one of the | ||||
The collation name itself is a single US-ASCII string beginning with | following 4 symbols: "-", ";", "=" and ".". The identifier MUST NOT | |||
a letter and made up of letters, digits, or one of the following 4 | be longer than 254 characters. | |||
symbols: "-", ";", "=" or ".". The name MUST NOT be longer than 254 | ||||
characters. | ||||
collation-char = ALPHA / DIGIT / "-" / ";" / "=" / "." | collation-char = ALPHA / DIGIT / "-" / ";" / "=" / "." | |||
collation-name = ALPHA *253collation-char | collation-id = ALPHA *253collation-char | |||
The string a client uses to select a collation MAY contain a wildcard | The identifier "default" is reserved. For protocol which have a | |||
("*") character which matches zero or more collation-chars. Wildcard | default collation, "default" refers to that collation. For other | |||
characters MUST NOT be adjacent. Clients which support disconnected | protocols, the identifier "default" matches no collations, and | |||
operation SHOULD NOT use wildcards to select a collation, but clients | servers SHOULD treat it in the same way as they treat nonexistent | |||
which provide collation operations only when connected to the server | collations. | |||
MAY use wildcards. If the wildcard string matches multiple | ||||
collations, the server SHOULD select the collation with the broadest | 3.2. Wildcards | |||
scope (preferably international scope), the most recent table | ||||
versions and the greatest number of supported operations. A single | The string a client uses to select a collation MAY contain one or | |||
wildcard character ("*") refers to the application protocol collation | more wildcard ("*") character which matches zero or more collation- | |||
behavior that would occur if no explicit negotiation were used. | chars. Wildcard characters MUST NOT be adjacent. If the wildcard | |||
string matches multiple collations, the server SHOULD select the | ||||
When used as a protocol element for ordering, the collation name MAY | collation with the broadest scope (preferably international scope), | |||
be prefixed by either "+" or "-" to explicitly specify an ordering | the most recent table versions and the greatest number of supported | |||
direction. As mentioned previously, "+" has no effect on the | operations. | |||
ordering function, while "-" negates the result of the ordering | ||||
function. In general, collation-order is used when a client requests | ||||
a collation, and collation-sel is used with the server informs the | ||||
client of the selected collation. | ||||
collation-wild = ("*" / (ALPHA ["*"])) *(collation-char ["*"]) | collation-wild = ("*" / (ALPHA ["*"])) *(collation-char ["*"]) | |||
; MUST NOT exceed 255 characters total | ; MUST NOT exceed 254 characters total | |||
collation-sel = ["+" / "-"] collation-name | 3.3. Ordering Direction | |||
When used as a protocol element for ordering, the collation | ||||
identifier MAY be prefixed by either "+" or "-" to explicitly specify | ||||
an ordering direction. "+" has no effect on the ordering operation, | ||||
while "-" inverts the result of the ordering operation. In general, | ||||
collation-order is used when a client requests a collation, and | ||||
collation-selected is used when the server informs the client of the | ||||
selected collation. | ||||
collation-selected = ["+" / "-"] collation-id | ||||
collation-order = ["+" / "-"] collation-wild | collation-order = ["+" / "-"] collation-wild | |||
Some protocols are designed to use URIs to refer to collations rather | 3.4. URIs | |||
than simple tokens. A special section of the IANA web page is | ||||
Some protocols are designed to use URIs [4] to refer to collations | ||||
rather than simple tokens. A special section of the IANA web page is | ||||
reserved for such usage. The "collation-uri" form is used to refer | reserved for such usage. The "collation-uri" form is used to refer | |||
to a specific IANA registry entry for a specific named collation (the | to a specific IANA registry entry for a specific named collation (the | |||
collation registration may not actually be present if it is | collation registration may not actually be present if it is | |||
experimental). The "collation-auri" form is an abstract name for an | experimental). The "collation-auri" form is an abstract name for an | |||
ordering, a comparator pattern or a vendor private comparator. | ordering, a collation pattern or a vendor private collator. | |||
collation-uri = "http://www.iana.org/assignments/collation/" | collation-uri = "http://www.iana.org/assignments/collation/" | |||
collation-name ".xml" | collation-id ".xml" | |||
collation-auri = ( "http://www.iana.org/assignments/collation/" | collation-auri = ( "http://www.iana.org/assignments/collation/" | |||
collation-order [".xml"]) / other-uri | collation-order ".xml" ) / other-uri | |||
other-uri = absoluteURI | other-uri = <absoluteURI> | |||
; excluding the IANA collation namespace. | ; excluding the IANA collation namespace. | |||
3.5. Naming Guidelines | ||||
While this specification makes no absolute requirements on the | While this specification makes no absolute requirements on the | |||
structure of collation names, naming consistency is important, so the | structure of collation identifiers, naming consistency is important, | |||
following initial guidelines are provided. | so the following initial guidelines are provided. | |||
Collation names with an international audience typically begin with | Collation identifiers with an international audience typically begin | |||
"i;". Collation names intended for a particular language or locale | with "i;". Collation identifiers intended for a particular language | |||
typically begin with a language tag [5] followed by a ";". After the | or locale typically begin with a language tag [5] followed by a ";". | |||
first ";" is normally the name of the general collation algorithm | After the first ";" is normally the name of the general collation | |||
followed by a series of algorithm modifications separated by the ";" | algorithm, followed by a series of algorithm modifications separated | |||
delimiter. Parameterized modifications will use "=" to delimit the | by the ";" delimiter. Parameterized modifications will use "=" to | |||
parameter from the value. The version numbers of any lookup tables | delimit the parameter from the value. The version numbers of any | |||
used by the algorithm SHOULD be present as parameterized | lookup tables used by the algorithm SHOULD be present as | |||
modifications. | parameterized modifications. | |||
Collation names of the form *;vnd-domain.com;* are reserved for | Collation identifiers of the form *;vnd-domain.com;* are reserved for | |||
vendor-specific collations created by the owner of the domain name | vendor-specific collations created by the owner of the domain name | |||
following the "vnd-" prefix. Registration of such collations (or the | following the "vnd-" prefix (e.g. vnd-example.com for the vendor | |||
name space as a whole) with intended use of "Vendor" is encouraged | example.com). Registration of such collations (or the name space as | |||
when a public specification or open-source implementation is | a whole) with intended use of "Vendor" is encouraged when a public | |||
available, but is not required. | specification or open-source implementation is available, but is not | |||
required. | ||||
4. Collation Specification Requirements | ||||
4.1. Collation/Server Interface | ||||
The collation itself defines what it operates on. Most collations | ||||
are expected to operate on character strings. The i;octet | ||||
(Section 9.4) collation operates on octet strings. The i;ascii- | ||||
numeric (Section 9.1) operation operates on numbers. | ||||
This specification defines the collation interface in terms of octet | ||||
strings. However, implementations may choose to use character | ||||
strings instead. Such implementations may not be able to implement | ||||
e.g. i;octet. Since i;octet is not currently mandatory to implement | ||||
for any protocol, this should not be a problem. | ||||
4. Collation Specification Requirements | 4.2. Operations Supported | |||
A collation specification MUST state which of the three basic | A collation specification MUST state which of the three basic | |||
functions are supported (equality, substring, ordering) and how to | operations are supported (equality, substring, ordering) and how to | |||
perform each of the supported functions on any two input | perform each of the supported operations on any two input character | |||
octet-strings including empty strings. Given a collation with a | strings including empty strings. Collations must be deterministic, | |||
specific name, and any two fixed input strings, the result MUST be | i.e. given a collation with a specific identifier, and any two fixed | |||
the same. The collation specification MUST state whether the | input strings, the result MUST be the same for the same operation. | |||
collation operates on raw octets or on characters (in which case the | ||||
UTF-8 charset is presumed). Collations MUST be transitive. | In general, collation operations should behave as their names | |||
suggest. While a collation may be new, the operations are not, so | ||||
A collation specification MUST describe the internal canonicalization | the new collation's operations should be similar to those of older | |||
algorithm. This algorithm can be applied to individual strings and | collations. For example, a date/time collation should not provide a | |||
the result strings can be stored to potentially optimize future | "substring" operation that would morph IMAP substring SEARCH into | |||
comparison operations. A collation MAY specify that the | e.g. a date-range search. | |||
canonicalization algorithm is the identity function. The output of | ||||
the canonicalization algorithm MAY have no meaning to a human. | A nonobvious consequence of the rules for each collation operation is | |||
that for any single collation, either none or all of the operations | ||||
Collations which use more than one customizable lookup table in a | can return "undefined". For example, it is not possible to have an | |||
documented format MUST assign numbers to the tables they use. This | equality operation that never returns "undefined" and a substring | |||
permits an application protocol command to access the tables used by | operation that occasionally does. | |||
a server collation. | ||||
4.2.1. Validity | ||||
o The equality function always returns "match" or "no-match" when | ||||
supplied valid input and MAY return "error" if the input strings | The validity test takes one string as argument returns valid if its | |||
are not valid UTF-8 strings or violate other collation | input string is valid input to collation's other operations, and | |||
constraints. | invalid if not. (In other words, a string is valid if it is equal to | |||
itself according to the collation's equality operation.) | ||||
o The substring matching function determines if the first string is | ||||
a substring of the second string. A collation which supports | The validity test is provided by all collations. It MUST NOT be | |||
substring matching will automatically support the two special | listed separately in the collation registration. | |||
cases of substring matching: prefix and suffix matching if those | ||||
special cases are supported by the application protocol. It | 4.2.2. Equality | |||
returns "match" or "no-match" when supplied valid input and | ||||
returns "error" when supplied invalid input. | The equality test always returns "match" or "no-match" when supplied | |||
valid input, and MAY return "undefined" if one or both input strings | ||||
o The ordering function determines how two octet strings are | are not valid. | |||
ordered. It returns "-1" if the first string is listed before the | ||||
second string according to the collation, "+1" if the second | The equality test MUST be reflexive and symmetric. For valid input, | |||
string is listed before the first string, and "0" if the two | it MUST be transitive. | |||
strings are equal. If the order of the two strings is reversed, | ||||
the result of the ordering function of the collation MUST be | If a collation provides either a substring or an ordering test, it | |||
negated. In general, collations SHOULD NOT return "0" unless the | MUST also provide an equality test. The substring and/or ordering | |||
two octet sequences are identical. | tests MUST be consistent with the equality test. | |||
Since ordering is normally used to sort a list of items, "error" | In this specification, the return values of the equality test are | |||
is not a useful return value from the ordering function. Strings | called "match", "no-match" and "undefined". This is not a | |||
with errors that prevent the sorting algorithm from functioning | specification, merely a choice of phrasing. | |||
correctly should sort to the end of the list. Thus if the first | ||||
string is invalid UTF-8 while the second string is valid, the | 4.2.3. Substring | |||
result will be "+1". If the second string is invalid UTF-8 while | ||||
the first string is valid, the result will be "-1". If the | The substring matching operation determines if the first string is a | |||
collation is character-based, and both strings are invalid UTF-8, | substring of the second string, ie. if one or more substrings of the | |||
the result SHOULD match the result from the "i;octet" collation. | second string is equal to the first, as defined by the collation's | |||
equality operation. | ||||
When the collation is used with a "+" prefix, the behavior is the | ||||
same as when used with no prefix. When the collation is used with | A collation which supports substring matching will automatically | |||
a "-" prefix, results which would be "+1" are instead "-1" and | support two special cases of substring matching: prefix and suffix | |||
results which would be "-1" are instead "+1". | matching if those special cases are supported by the application | |||
protocol. It returns "match" or "no-match" when supplied valid input | ||||
Unless otherwise specified by the collation or application protocol, | and returns "undefined" when supplied invalid input. | |||
a NULL string (as opposed to an empty string) is equal only to | ||||
another NULL string, a NULL string is not a substring of any other | ||||
string, and a NULL string sorts to a position after all non-NULL | ||||
strings, but before strings which generate errors. | ||||
Some application protocols will permit the use of multi-value | ||||
attributes with a collation. This paragraph describes the rules that | ||||
apply unless otherwise specified by the collation or application | ||||
protocol. The equality and substring collation algorithms will be | ||||
iterated over each pair of single values from the two inputs. If any | ||||
combination produces an error, the result is an error. Otherwise, if | ||||
any combination produces a "match", the result is a match. Otherwise | ||||
the result is "no-match". For the ordering function, the smallest | ||||
ordinal octet string from the first set of values is compared to the | ||||
smallest ordinal octet string from the second set of values. | ||||
Application protocols MAY return position information for substring | Application protocols MAY return position information for substring | |||
matches. If this is done, the position information MUST include both | matches. If this is done, the position information SHOULD include | |||
the starting offset and the ending offset in the string. This is | both the starting offset and the ending offset for each match. This | |||
important because more sophisticated collations can match strings of | is important because more sophisticated collations can match strings | |||
unequal length (for example, a pre-composed accented character will | of unequal length (for example, a pre-composed accented character can | |||
match a decomposed accented character). | match a decomposed accented character). In general, overlapping | |||
matches SHOULD be reported (as when "ana" occurs twice within | ||||
Collation specifications intended for common use are expected to | "banana") although there are cases where a collation may decide not | |||
reference standards from standards bodies with significant experience | to. For example, in a collation which treats all whitespace | |||
dealing with the details of international character sets. | sequences as identical, the substring operation could be defined such | |||
that " 1 " (SP "1" SP) is reported just once within " 1 " (SP SP "1" | ||||
5. Application Protocol Requirements | SP SP), not four times (SP SP 1 SP, SP 1 SP, SP 1 SP SP and SP SP 1 | |||
SP SP). | ||||
An application protocol which offers searching, substring matching | ||||
and/or sorting and permits the use of characters outside the US-ASCII | A string is a substring of itself. The empty string is a substring | |||
charset needs to consider the following requirements and issues: | of all strings. | |||
Note that the substring operation of some collations can match | ||||
strings of unequal length. For example, a pre-composed accented | ||||
character can match a decomposed accented character. Unicode | ||||
Collation Algorithm [8] discusses this in more detail. | ||||
In this specification, the return values of the substring operation | ||||
are called "match", "no-match" and "undefined". This is not a | ||||
specification, merely a choice of phrasing. | ||||
4.2.4. Ordering | ||||
The ordering operation determines how two strings are ordered. It | ||||
MUST be trichotomous and reflexive. For valid input, it MUST be | ||||
transitive. | ||||
Ordering returns "less" if the first string is listed before the | ||||
second string according to the collation, "greater" if the second | ||||
string is listed before the first string, and "equal" if the two | ||||
strings are equal as defined by the collation's equality operation. | ||||
If one or both strings are invalid, the result of ordering is | ||||
"undefined". | ||||
When the collation is used with a "+" prefix, the behavior is the | ||||
same as when used with no prefix. When the collation is used with a | ||||
"-" prefix, the result of the ordering operation of the collation | ||||
MUST be reversed. | ||||
In this specification, the return values of the ordering operation | ||||
are called "less", "equal", "greater" and "undefined". This is not a | ||||
specification, merely a choice of phrasing. | ||||
4.3. Sort Keys | ||||
A collation specification SHOULD describe the internal transformation | ||||
algorithm to generate sort keys. This algorithm can be applied to | ||||
individual strings and the result can be stored to potentially | ||||
optimize future comparison operations. A collation MAY specify that | ||||
the sort key is generated by the identity function. The sort key may | ||||
have no meaning to a human. The sort key may not be valid input to | ||||
the collation. | ||||
4.4. Use of Lookup Tables | ||||
Some collations use customizable lookup tables, e.g. because the | ||||
tables depend on locale and may be modified after shipping the | ||||
software. Collations which use more than one customizable lookup | ||||
table in a documented format MUST assign numbers to the tables they | ||||
use. This permits an application protocol command to access the | ||||
tables used by a server collation, so that clients and servers use | ||||
the same tables. | ||||
5. Application Protocol Requirements | ||||
This section describes the requirements and issues that an | ||||
application protocol needs to consider if it offers searching, | ||||
substring matching and/or sorting, and permits the use of characters | ||||
outside the US-ASCII charset. | ||||
5.1. Character Encoding | ||||
The protocol specification has to make sure that it is clear on which | ||||
characters (rather than just octets) the collations are used. This | ||||
can be done by specifying the protocol itself in terms of characters | ||||
(e.g. in the case of a query language), by specifying a single | ||||
character encoding for the protocol (e.g. UTF-8 [3]), or by | ||||
carefully describing the relevant issues of character encoding | ||||
labeling and conversion. In the later case, details to consider | ||||
include how to handle unknown charsets, any charsets which are | ||||
mandatory-to-implement, any issues with byte-order that might apply, | ||||
and any transfer encodings which need to be supported. | ||||
5.2. Operations | ||||
The protocol must specify which of the operations defined in this | ||||
specification (equality matching, substring matching and ordering) | ||||
can be invoked in the protocol, and how they are invoked. There may | ||||
be more than one way to invoke an operation. | ||||
The protocol MUST provide a mechanism for the client to select the | The protocol MUST provide a mechanism for the client to select the | |||
collation to use with equality matching, substring matching and | collation to use with equality matching, substring matching and | |||
ordering. | ordering. | |||
The protocol MUST specify how comparisons behave in the absence of an | If a protocol needs a total ordering and the collation chosen does | |||
explicit collation negotiation or when a collation negotiation of "*" | not provide it because the ordering operation returns "undefined" at | |||
is used. The protocol MAY specify that the default collation used in | least once, the recommended fallback is to sort all invalid strings | |||
such circumstances is sensitive to server configuration. | after the valid ones, and use i;octet to order the invalid strings. | |||
The protocol SHOULD provide a way to list available collations | Although the collation's substring function provides a list of | |||
matching a given wildcard pattern or patterns. | matches, a protocol need not provide all that to the client. It may | |||
provide only the first matching substring, or even just the | ||||
information that the substring search matched. | ||||
If the protocol provides positional information for the results of a | If the protocol provides positional information for the results of a | |||
substring match, that positional information MUST fully specify the | substring match, that positional information SHOULD fully specify the | |||
substring in the result that matches independent of the length of the | substring(s) in the result that matches independent of the length of | |||
search string. For example, returning both the starting and ending | the search string. For example, returning both the starting and | |||
offset of the match would suffice, as would the starting offset and a | ending offset of the match would suffice, as would the starting | |||
length. Returning just the starting offset is not acceptable. This | offset and a length. Returning just the starting offset is not | |||
rule is necessary because advanced collations can treat strings of | acceptable. This rule is necessary because advanced collations can | |||
different lengths as equal (for example, pre-composed and decomposed | treat strings of different lengths as equal (for example, pre- | |||
accented characters). | composed and decomposed accented characters). | |||
If the protocol permits the use of collations on stored character | 5.3. Wildcards | |||
data which is not encoded with the UTF-8 charset, then the protocol | ||||
specification has to describe relevant issues of the conversion. | The protocol MUST specify whether it allows the use of wildcards in | |||
Details to consider include how to handle unknown charsets, any | collation identifiers or not. If the protocol allows wildcards, | |||
charsets which are mandatory-to-implement, any issues with byte-order | then: | |||
that might apply, and any transfer encodings which need to be | The protocol MUST specify how comparisons behave in the absence of | |||
supported. | explicit collation negotiation or when a collation of "*" is | |||
requested. The protocol MAY specify that the default collation | ||||
used in such circumstances is sensitive to server configuration. | ||||
The protocol SHOULD provide a way to list available collations | ||||
matching a given wildcard pattern or patterns. | ||||
5.4. Canonicalization Function | ||||
If the protocol uses a canonicalization function for strings, then | ||||
use of collations MAY be appropriate for that function. As an | ||||
example, many protocols use case independent strings. In most cases, | ||||
a simple ASCII mapping to upper/lower case works well, as i;ascii- | ||||
casemap offers. However, in some cases another collation may be | ||||
better, e.g. to handle Turkish dotted/dotless i. Protocol designers | ||||
should consider in each case whether to use a specifiable collation. | ||||
If the protocol provides a canonicalization function for strings, | 5.5. Disconnected Clients | |||
then use of collations MAY be appropriate for that function. | ||||
If the protocol supports disconnected clients, then a mechanism for | If the protocol supports disconnected clients, then a mechanism for | |||
the client to precisely replicate the server's collation algorithm is | the client to precisely replicate the server's collation algorithm is | |||
likely desirable. Thus the protocol MAY wish to provide a command to | likely desirable. Thus the protocol MAY wish to provide a command to | |||
fetch lookup tables used by charset conversions and collations. | fetch lookup tables used by charset conversions and collations. | |||
5.6. Error Codes | ||||
The protocol specification should consider assigning protocol error | The protocol specification should consider assigning protocol error | |||
codes for the following circumstances: | codes for the following circumstances: | |||
o The client requests the use of a collation by identifier or | ||||
o The client requests the use of a collation by name or pattern, but | pattern, but no implemented collation matches that pattern. | |||
no implemented collation matches that pattern. | o The client attempts to use a collation for an operation that is | |||
not supported by that collation. For example, attempting to use | ||||
o The client attempts to use a collation for a function that is not | the "i;ascii-numeric" collation for substring matching. | |||
supported by that collation. For example, attempting to use the | ||||
"i;ascii-numeric" collation for a substring matching function. | ||||
o The client uses an equality or substring matching collation and | o The client uses an equality or substring matching collation and | |||
the result is an error. It may be appropriate to distinguish | the result is an error. It may be appropriate to distinguish | |||
between the two input strings, particularly when one is supplied | between the two input strings, particularly when one is supplied | |||
by the client and one is stored by the server. It might also be | by the client and one is stored by the server. It might also be | |||
appropriate to distinguish the specific case of an invalid UTF-8 | appropriate to distinguish the specific case of an invalid UTF-8 | |||
string. | string. | |||
If the protocol permits the use of a collation with data structures | 5.7. Octet Collation | |||
beyond those described in this specification (octet strings, NULL | ||||
string, array of octet strings), the protocol MUST describe the | ||||
default behavior for a collation with that data structure. | ||||
6. Initial Collations | The i;octet (Section 9.4) collation is only usable with protocols | |||
based on octet-strings. Clients and servers MUST NOT use i;octet | ||||
with other protocols. | ||||
This section describes an initial set of collations for the collation | If the protocol permits the use of collations with data structures | |||
registry. | other than strings, the protocol MUST describe the default behavior | |||
for a collation with those data structures. | ||||
6.1 Octet Collation | ||||
The "i;octet" collation is a simple and fast collation intended for | ||||
use on binary octet strings rather than on character data. It never | ||||
returns an "error" result. It provides equality, substring and | ||||
ordering functions. The ordering algorithm is as follows: | ||||
1. If both strings are the empty string, return the result "0". | ||||
2. If the first string is empty and the second is not, return the | ||||
result "-1". | ||||
3. If the second string is empty and the first is not, return the | 6. Use by Existing Protocols | |||
result "+1". | ||||
4. If both strings begin with the same octet value, remove the first | ||||
octet from both strings and repeat this algorithm from step 1. | ||||
5. If the unsigned value (0 to 255) of the first octet of the first | Both ACAP [12] and Sieve [15] are standards track specifications | |||
string is less than the unsigned value of the first octet of the | which used collations prior to the creation of this specification and | |||
second string, then return "-1". | registry. Those standards do not meet all the application protocol | |||
requirements described in Section 5. | ||||
6. If this step is reached, return "+1". | These protocols allow the use of the i;octet (Section 9.4) collation | |||
working directly on UTF-8 data as used in these protocols. | ||||
This algorithm is roughly equivalent to the C library function memcmp | In Sieve, all matches are either true and false. Accordingly, Sieve | |||
with appropriate length checks added. | servers must treat "undefined" and "no-match" results of the equality | |||
and substring operations as false, and only "match" as true. | ||||
The matching function returns "match" if the sorting algorithm would | In ACAP and Sieve, there are no invalid strings. In this document's | |||
return "0". Otherwise the matching function returns "no-match". | terms, invalid strings sort after valid strings. | |||
The substring function returns "match" if the first string is the | IMAP [16] also collates, although that is explicit only when the | |||
empty string, or if there exists a substring of the second string of | COMPARATOR [18] extension is used. The built-in IMAP substring | |||
length equal to the length of the first string which would result in | operation and the ordering provided by the SORT [17] extension may | |||
a "match" result from the equality function. Otherwise the substring | not meet the requirements made in this document. | |||
function returns "no-match". | ||||
The associated canonicalization algorithm is the identity function. | Other protocols may be in a similar position. | |||
6.2 ASCII Numeric Collation | In IMAP, the default collation is i;ascii-casemap, because its | |||
operations most closely resembles IMAP's built-in operations. | ||||
The "i;ascii-numeric" collation is a simple collation intended for | 7. Collation Registration | |||
use with arbitrary sized decimal numbers stored as octet strings of | ||||
US-ASCII digits (0x30 to 0x39). It supports equality and ordering, | ||||
but does not support the substring function. The algorithm is as | ||||
follows: | ||||
1. If neither string begins with a digit, return "error" if | ||||
matching, or the result of the "i;octet" collation for ordering. | ||||
2. If the first string begins with a digit and the second string | ||||
does not, return "error" if matching and "-1" for ordering. | ||||
3. If the second string begins with a digit and the first string | ||||
does not, return "error" if matching and "+1" for ordering. | ||||
4. Let "n" be the number of digits at the beginning of the first | ||||
string, and "m" be the number of digits at the beginning of the | ||||
second string. | ||||
5. If n is equal to m, return the result of the "i;octet" collation. | ||||
6. If n is greater than m, prepend a string of "n - m" zeros to the | ||||
second string and return the result of the "i;octet" collation. | ||||
7. If m is greater than n, prepend a string of "m - n" zeros to the | ||||
first string and return the result of the "i;octet" collation. | ||||
The associated canonicalization algorithm is to truncate the input | ||||
string at the first non-digit character. | ||||
6.3 ASCII Casemap Collation | ||||
The "en;ascii-casemap" collation is a simple collation intended for | ||||
use with English language text in pure US-ASCII. It provides | ||||
equality, substring and ordering functions. The algorithm first | ||||
applies a canonicalization algorithm to both input strings which | ||||
subtracts 32 (0x20) from all octet values between 97 (0x61) and 122 | ||||
(0x7A) inclusive. The result of the collation is then the same as | ||||
the result of the "i;octet" collation for the canonicalized strings. | ||||
Care should be taken when using OS-supplied functions to implement | ||||
this collation as this is not locale sensitive, but functions such as | ||||
strcasecmp and toupper can be locale sensitive. | ||||
For historical reasons, in the context of ACAP and Sieve, the name | 7.1. Collation Registration Procedure | |||
"i;ascii-casemap" is a synonym for this collation. | ||||
6.4 Nameprep Collation | The IETF will create a mailing list, collation@ietf.org, which can be | |||
used for public discussion of collation proposals prior to | ||||
registration. Use of the mailing list is strongly encouraged. The | ||||
IESG will appoint a designated expert who will monitor the | ||||
collation@ietf.org mailing list and review registrations. | ||||
The "i;nameprep;v=1;uv=3.2" collation is an implementation of the | The registration procedure begins when a completed registration | |||
nameprep [7] specification based on normalization tables from Unicode | template is sent to iana@iana.org and collation@ietf.org. The | |||
version 3.2. This collation applies the nameprep canoncialization | ||||
function to both input strings and then returns the result of the | ||||
i;octet collation on the canonicalized strings. While this collation | ||||
offers all three functions, the ordering function it provides is | ||||
inadequate for use by the majority of the world. | ||||
Version number 1 is applied to nameprep as specified in RFC 3491. If | ||||
the nameprep specification is revised without any changes that would | ||||
produce different results when given the same pair of input octet | ||||
strings, then the version number will remain unchanged. | ||||
The table numbers for tables used by nameprep are as follows: | ||||
+--------------+-----------------------+ | ||||
| Table Number | Table Name | | ||||
+--------------+-----------------------+ | ||||
| 1 | UnicodeData-3.2.0.txt | | ||||
| 2 | Table B.1 | | ||||
| 3 | Table B.2 | | ||||
| 4 | Table C.1.2 | | ||||
| 5 | Table C.2.2 | | ||||
| 6 | Table C.3 | | ||||
| 7 | Table C.4 | | ||||
| 8 | Table C.5 | | ||||
| 9 | Table C.6 | | ||||
| 10 | Table C.7 | | ||||
| 11 | Table C.8 | | ||||
| 12 | Table C.9 | | ||||
+--------------+-----------------------+ | ||||
6.5 Basic Collation | ||||
The basic collation is intended to provide tolerable results for a | ||||
number of languages for all three functions (equality, substring and | ||||
ordering) so it is suitable as a mandatory-to-implement collation for | ||||
protocols which include ordering support. The ordering function of | ||||
the basic collation is the Unicode Collation Algorithm [8] version 9 | ||||
(UCAv9). | ||||
The equality and substring functions are created as described in | ||||
UCAv9 section 8. While that section is informative to UCAv9, it is | ||||
normative to this collation specification. | ||||
This collation is based on Unicode version 3.2, with the following | ||||
tables relevant: | ||||
1. For the normalization step, UnicodeData-3.2.0.txt [16] is used. | ||||
Column 5 is used to determine the canonical decomposition, while | ||||
column 3 contains the canonical combining classes necessary to | ||||
attain canonical order. | ||||
2. The table of characters which require a logical order exception | ||||
is a subset of the table in PropList-3.2.0.txt [17] and is | ||||
included here: | ||||
0E40..0E44 ; Logical_Order_Exception | ||||
# Lo [5] THAI CHARACTER SARA E..THAI CHARACTER SARA AI MAIMALAI | ||||
0EC0..0EC4 ; Logical_Order_Exception | ||||
# Lo [5] LAO VOWEL SIGN E..LAO VOWEL SIGN AI | ||||
# Total code points: 10 | ||||
3. The table used to translate normalized code points to a sort key | ||||
is allkeys-3.1.1.txt [18]. | ||||
UCAv9 includes a number of configurable parameters and steps labelled | ||||
as potentially optional. The following list summarizes the defaults | ||||
used by this collation: | ||||
o The logical order exception step is mandatory by default to | ||||
support the largest number of languages. | ||||
o Steps 2.1.1 to 2.1.3 are mandatory as the repertoire of the basic | ||||
collation is intended to be large. | ||||
o The second level in the sort key is evaluated forwards by default. | ||||
o The variable weighting uses the "non-ignorable" option by default. | ||||
o The semi-stable option is not used by default. | ||||
o Support for exactly three levels of collation is the default | ||||
behavior. | ||||
o No preprocessing step is used by the basic collation prior to | ||||
applying the UCAv9 algorithm. Note that an application protocol | ||||
specification MAY require pre-processing prior to the use of any | ||||
collations. | ||||
o The equality and substring algorithms exclude differences at level | ||||
2 and 3 by default (thus it is case-insensitive and ignores | ||||
accentual distinctions. | ||||
o The equality and substring algorithms use the "Whole Characters | ||||
Only" feature described in UCAv9 section 8 by default. | ||||
The exact collation name with these defaults is | ||||
"i;basic;uca=3.1.1;uv=3.2". When a specification states that the | ||||
basic collation is mandatory-to-implement, only this specific name is | ||||
mandatory-to-implement. | ||||
In order to allow modification of the optional behaviors, the | ||||
following ABNF is used for variations of the basic collation: | ||||
basic-collation = ("i" / Language-Tag) ";basic;uca=3.1.1;uv=3.2" | ||||
[";match=accent" / ";match=case"] | ||||
[";tailor=" 1*collation-char ] | ||||
If multiple modifiers appear, they MUST appear in the order described | ||||
above. The modifiers have the following meanings: | ||||
match=accent Both the first and second levels of the sort keys are | ||||
considered relevant to the equality and substring | ||||
operations (rather than the default of first level | ||||
only). This makes the matching functions sensitive to | ||||
accentual distinctions. | ||||
match=case The first three levels of sort keys are considered | ||||
relevant to the equality and substring operations. | ||||
This makes the matching functions sensitive to both | ||||
case and accentual distinctions. | ||||
The default weighting option is "non-ignorable". The "semi-stable" | ||||
sort key option is not used by default. | ||||
The canonicalization algorithm associated with this collation is the | ||||
output of step 3 of the UCAv9 algorithm (described in section 4.3 of | ||||
the UCA specification). This canonicalization is not suitable for | ||||
human consumption. | ||||
Finally, the UCAv9 algorithm permits the "allkeys" table to be | ||||
tailored to a language. People who make quality tailorings are | ||||
encouraged to register those tailorings using the collation registry. | ||||
Tailoring names beginning with "x" are reserved for experimental use, | ||||
are treated as "Limited use" and MUST NOT match wildcards if any | ||||
registered collation is available that does match. | ||||
7. Use by ACAP and Sieve | ||||
Both ACAP [11] and Sieve [15] are standards track specifications | ||||
which used collations prior to the creation of this specification and | ||||
registry. Those standards do not meet all the application protocol | ||||
requirements described in Section 5. For backwards compatibility, | ||||
those protocols use the "i;ascii-casemap" instead of | ||||
"en;ascii-casemap". | ||||
8. IANA Considerations | ||||
8.1 Collation Registration Procedure | ||||
IANA will create a mailing list collation@iana.org which can be used | ||||
for public discussion of collation proposals prior to registration. | ||||
Use of the mailing list is encouraged but not required. The actual | ||||
registration procedure will not begin until the completed | ||||
registration template is sent to iana@iana.org. The IESG will | ||||
appoint a designated expert who will monitor the collation@iana.org | ||||
mailing list and review registrations forwarded from IANA. The | ||||
designated expert is expected to tell IANA and the submitter of the | designated expert is expected to tell IANA and the submitter of the | |||
registration within two weeks whether the registration is approved, | registration within two weeks whether the registration is approved, | |||
approved with minor changes, or rejected with cause. When a | approved with minor changes, or rejected with cause. When a | |||
registration is rejected with cause, it can be re-submitted if the | registration is rejected with cause, it can be re-submitted if the | |||
concerns listed in the cause are addressed. Decisions made by the | concerns listed in the cause are addressed. Decisions made by the | |||
designated expert can be appealed to the IESG and subsequently follow | designated expert can be appealed to IESG Applications Area Director, | |||
the normal appeals procedure for IESG decisions. | then to the IESG. They follow the normal appeals procedure for IESG | |||
decisions. | ||||
Collation registrations in a standards track, BCP or IESG-approved | Collation registrations in a standards track, BCP or IESG-approved | |||
experimental RFC are owned by the IESG and changes to the | experimental RFC are owned by the IETF, and changes to the | |||
registration follow normal procedures for updating such documents. | registration follow normal procedures for updating such documents. | |||
Collation registrations in other RFCs are owned by the RFC author(s). | Collation registrations in other RFCs are owned by the RFC author(s). | |||
Other collation registrations are owned by the individual(s) listed | Other collation registrations are owned by the individual(s) listed | |||
in the contact field of the registration and IANA will preserve this | in the contact field of the registration and IANA will preserve this | |||
information. Changes to a registration MUST be approved by the | information. Changes to a registration MUST be approved by the | |||
owner. In the event the owner can't be contacted for a period of one | owner. In the event the owner cannot be contacted for a period of | |||
month and a change is deemed necessary, the IESG MAY re-assign | one month and a change is deemed necessary, the IESG MAY re-assign | |||
ownership to an appropriate party. | ownership to an appropriate party. | |||
8.2 Collation Registration Template | 7.2. Collation Registration Format | |||
Registration of a collation is done by sending a well-formed XML | Registration of a collation is done by sending a well-formed XML | |||
document that validates with collationreg.dtd (Section 9). The | document to collation@ietf.org and iana@iana.org. | |||
registration MUST include a collation element that MAY include an | ||||
"rfc=" attribute if the specification is in an RFC and MUST include a | ||||
scope attribute of "i18n", "local" or "other" and an intendedUse | ||||
attribute of "common", "limited", "vendor", or "deprecated". | ||||
The collation element contains the other elements in the | 7.2.1. Registration Template | |||
registration. The mandatory name element gives the precise name of | ||||
the comparator. The mandatory title element give the title of the | ||||
comparator. The mandatory functions element lists which of the three | ||||
functions the comparator provides. The mandatory specification | ||||
element describes where to find the specification, and MAY have a URI | ||||
attribute. The submittor element provides an RFC 2822 email address | ||||
for the person who submitted the registration. It is optional if the | ||||
owner element contains an email address. The mandatory owner element | ||||
contains either the four letters "IETF" or an email address of the | ||||
owner of the registration. The optional version element is included | ||||
when the registration is likely to be revised or has been revised in | ||||
such a way that the results change for certain input strings. The | ||||
optional UnicodeVersion element indicates the version number of the | ||||
UnicodeData file on which the collation is based. The optional | ||||
UCAVersion element specifics the version of the Unicode Collation | ||||
Algorithm on which the collation is based. The optional | ||||
UCAMatchLevel element specifies the number of Unicode Collation | ||||
Algorithm sort key levels used for the equality and substring | ||||
operations. | ||||
Here is a template for the registration: | Here is a template for the registration: | |||
<?xml verison='1.0'?> | <?xml version='1.0'?> | |||
<!DOCTYPE rfc SYSTEM 'collationreg.dtd'> | <!DOCTYPE collation SYSTEM 'collationreg.dtd'> | |||
<collation rfc="XXXX" scope="i18n" intendedUse="common"> | <collation rfc="YYYY" scope="i18n" intendedUse="common"> | |||
<name>collation name</name> | <identifier>collation identifier</identifier> | |||
<title>technical title for collation</title> | <title>technical title for collation</title> | |||
<functions>equality order substring</functions> | <operations>equality order substring</operations> | |||
<specification>specification reference</specification> | <specification>specification reference</specification> | |||
<owner>email address of owner or IETF</owner> | <owner>email address of owner or IETF</owner> | |||
<submittor>email address of submittor<submittor> | <submitter>email address of submitter</submitter> | |||
<version>1</version> | <version>1</version> | |||
<UnicodeVersion>3.2</UnicodeVersion> | ||||
<UCAVersion>3.1.1</UCAVersion> | ||||
</collation> | </collation> | |||
7.2.2. The collation Element | ||||
The root of the registration document MUST be a <collation> element. | ||||
The collation element contains the other elements in the | ||||
registration, which are described in the following sub-subsections, | ||||
in the order given here. | ||||
The <collation> element MAY include an "rfc=" attribute if the | ||||
specification is in an RFC. The "rfc=" attribute gives only the | ||||
number of the RFC, without any prefix, such as "RFC", or suffix, such | ||||
as ".txt". | ||||
The <collation> element MUST include a "scope=" attribute, which MUST | ||||
have one of the values "i18n", "local" or "other". | ||||
The <collation> element MUST include an "intendedUse=" attribute, | ||||
which must have one of the values "common", "limited", "vendor", or | ||||
"deprecated". Collation specifications intended for "common" use are | ||||
expected to reference standards from standards bodies with | ||||
significant experience dealing with the details of international | ||||
character sets. | ||||
Be aware that future revisions of this specification may add | Be aware that future revisions of this specification may add | |||
additional function types, as well as additional XML attributes and | additional function types, as well as additional XML attributes, | |||
values. Any system which automatically parses these XML documents | values and elements. Any system which automatically parses these XML | |||
MUST take this into account to preserve future compatibility. | documents MUST take this into account to preserve future | |||
compatibility. | ||||
8.3 Octet Collation Registration | 7.2.3. The identifier Element | |||
<?xml verison='1.0'?> | The <identifier> element gives the precise identifier of the | |||
<!DOCTYPE rfc SYSTEM 'collationreg.dtd'> | collation, e.g. i;ascii-casemap. The <identifier> element is | |||
<collation rfc="XXXX" scope="i18n" intendedUse="common"> | mandatory. | |||
<name>i;octet</name> | ||||
<title>Octet</title> | ||||
<functions>equality order substring</functions> | ||||
<specification>RFC XXXX</specification> | ||||
<owner>IETF</owner> | ||||
<submittor>chris.newman@sun.com<submittor> | ||||
</collation> | ||||
8.4 ASCII Numeric Collation Registration | 7.2.4. The title Element | |||
<?xml verison='1.0'?> | The <title> element gives the title of the collation. The <title> | |||
<!DOCTYPE rfc SYSTEM 'collationreg.dtd'> | element is mandatory. | |||
<collation rfc="XXXX" scope="other" intendedUse="limited"> | ||||
<name>i;ascii-numeric</name> | ||||
<title>ASCII Numeric</title> | ||||
<functions>equality order</functions> | ||||
<specification>RFC XXXX</specification> | ||||
<owner>IETF</owner> | ||||
<submittor>chris.newman@sun.com<submittor> | ||||
</collation> | ||||
8.5 Legacy English Casemap Collation Registration | 7.2.5. The operations Element | |||
<?xml verison='1.0'?> | The <operations> element lists which of the three operations | |||
<!DOCTYPE rfc SYSTEM 'collationreg.dtd'> | ("equality", "order" or "substring") the collation provides, | |||
<collation rfc="XXXX" scope="local" intendedUse="deprecated"> | separated by single spaces. The <operations> element is mandatory. | |||
<name>i;ascii-casemap</name> | ||||
<title>Legacy English Casemap</title> | ||||
<functions>equality order substring</functions> | ||||
<specification>RFC XXXX</specification> | ||||
<owner>IETF</owner> | ||||
<submittor>chris.newman@sun.com<submittor> | ||||
</collation> | ||||
8.6 English Casemap Collation Registration | 7.2.6. The specification Element | |||
<?xml verison='1.0'?> | The <specification> element describes where to find the | |||
<!DOCTYPE rfc SYSTEM 'collationreg.dtd'> | specification. The <specification> element is mandatory. It MAY | |||
<collation rfc="XXXX" scope="local" intendedUse="common"> | have a URI attribute. There may be more than one <specification> | |||
<name>en;ascii-casemap</name> | elements, in which case they together form the specification. | |||
<title>English Casemap</title> | ||||
<functions>equality order substring</functions> | ||||
<specification>RFC XXXX</specification> | ||||
<owner>IETF</owner> | ||||
<submittor>chris.newman@sun.com<submittor> | ||||
</collation> | ||||
8.7 Nameprep Collation Registration | If it is discovered that parts of a collation specification conflict, | |||
a new revision of the collation is necessary, and the | ||||
collation@ietf.org mailing list should be notified. | ||||
<?xml verison='1.0'?> | 7.2.7. The submitter Element | |||
<!DOCTYPE rfc SYSTEM 'collationreg.dtd'> | ||||
<collation rfc="XXXX" scope="i18n" intendedUse="common"> | ||||
<name>i;nameprep;v=1;uv=3.2</name> | ||||
<title>Nameprep</title> | ||||
<functions>equality order substring</functions> | ||||
<specification>RFC XXXX</specification> | ||||
<owner>IETF</owner> | ||||
<submittor>chris.newman@sun.com<submittor> | ||||
<version>1</version> | ||||
<UnicodeVersion>3.2</UnicodeVersion> | ||||
</collation> | ||||
8.8 Basic Collation Registration | The <submitter> element provides an RFC 2822 [13] email address for | |||
the person who submitted the registration. It is optional if the | ||||
<owner> element contains an email address. | ||||
<?xml verison='1.0'?> | There may be more than one <submitter> element. | |||
<!DOCTYPE rfc SYSTEM 'collationreg.dtd'> | ||||
<collation rfc="XXXX" scope="i18n" intendedUse="common"> | ||||
<name>i;basic;uca=3.1.1;uv=3.2</name> | ||||
<title>Basic</title> | ||||
<functions>equality order substring</functions> | ||||
<specification>RFC XXXX</specification> | ||||
<owner>IETF</owner> | ||||
<submittor>chris.newman@sun.com<submittor> | ||||
<UnicodeVersion>3.2</UnicodeVersion> | ||||
<UCAVersion>3.1.1</UCAVersion> | ||||
<UCAMatchLevel>1</UCAMatchLevel> | ||||
</collation> | ||||
8.9 Basic Accent Sensitive Match Collation Registration | 7.2.8. The owner Element | |||
<?xml verison='1.0'?> | The <owner> element contains either the four letters "IETF" or an | |||
<!DOCTYPE rfc SYSTEM 'collationreg.dtd'> | email address of the owner of the registration. The <owner> element | |||
<collation rfc="XXXX" scope="i18n" intendedUse="common"> | is mandatory. There may be more than one <owner> element. If so, | |||
<name>i;basic;uca=3.1.1;uv=3.2;match=accent</name> | all owners are equal. Each owner can speak for all. | |||
<title>Basic Accent Sensitive Match</title> | ||||
<functions>equality order substring</functions> | ||||
<specification>RFC XXXX</specification> | ||||
<owner>IETF</owner> | ||||
<submittor>chris.newman@sun.com<submittor> | ||||
<UnicodeVersion>3.2</UnicodeVersion> | ||||
<UCAVersion>3.1.1</UCAVersion> | ||||
<UCAMatchLevel>2</UCAMatchLevel> | ||||
</collation> | ||||
8.10 Basic Case Sensitive Match Collation Registration | 7.2.9. The version Element | |||
<?xml verison='1.0'?> | The <version> element is included when the registration is likely to | |||
<!DOCTYPE rfc SYSTEM 'collationreg.dtd'> | be revised or has been revised in such a way that the results change | |||
<collation rfc="XXXX" scope="i18n" intendedUse="common"> | for certain input strings. The <version> element is optional. | |||
<name>i;basic;uca=3.1.1;uv=3.2;match=case</name> | ||||
<title>Basic Case Sensitive Match</title> | 7.2.10. The variable Element | |||
<functions>equality order substring</functions> | ||||
<specification>RFC XXXX</specification> | The <variable> element specifies an optional variable using which the | |||
<owner>IETF</owner> | collation's behaviour can be tailored. The <variable> element is | |||
<submittor>chris.newman@sun.com<submittor> | optional. When it is used, it must contain <name> and <default> | |||
<UnicodeVersion>3.2</UnicodeVersion> | elements and may contain one or more <value> elements. | |||
<UCAVersion>3.1.1</UCAVersion> | ||||
<UCAMatchLevel>3</UCAMatchLevel> | ||||
</collation> | ||||
8.11 Structure of Collation Registry | 7.2.11. The name Element | |||
The <name> element specifies the name value of a variable. The | ||||
<name> element is mandatory. | ||||
7.2.12. The default Element | ||||
The <default> element specifies the default value of a variable. The | ||||
<default> element is mandatory. | ||||
7.2.13. The value Element | ||||
The <value> element specifies a legal value of a variable. The | ||||
<value> element is optional. If one or more <value> elements are | ||||
present, only those values are legal. If none is, then the | ||||
variable's legal values do not form an enumerated set, and the rules | ||||
MUST be specified in an RFC accompanying the registration. | ||||
7.3. Structure of Collation Registry | ||||
Once the registration is approved, IANA will store each XML | Once the registration is approved, IANA will store each XML | |||
registration document in a URL of the form http://www.iana.org/ | registration document in a URL of the form | |||
assignments/collation/collation-name.xml where collation-name is the | http://www.iana.org/assignments/collation/collation-id.xml where | |||
contents of the name element in the registration. Both the submittor | collation-id is the contents of the identifier element in the | |||
and the designated expert is responsible for verifying that the XML | registration. Both the submitter and the designated expert are | |||
is well-formed and complies with the DTD. In the future, it is hoped | responsible for verifying that the XML is well-formed. The | |||
IANA will take over XML verification responsibility from the | registration document should avoid using new elements. If any are | |||
designated expert. | necessary, it is important to be consistent with other registrations. | |||
IANA will also maintain a text summary of the registry under the name | IANA will also maintain a text summary of the registry under the name | |||
http://www.iana.org/assignments/collation/summary.txt. This summary | http://www.iana.org/assignments/collation/summary.txt. This summary | |||
is divided into four sections. The first section is for collations | is divided into four sections. The first section is for collations | |||
intended for common use. This section is intended for collation | intended for common use. This section is intended for collation | |||
registrations published in IESG approved RFCs or for locally scoped | registrations published in IESG approved RFCs or for locally scoped | |||
collations from the primary standards body for that locale. The | collations from the primary standards body for that locale. The | |||
designated expert is encouraged to reject collation registrations | designated expert is encouraged to reject collation registrations | |||
with an intended use of "common" if the expert believes it should be | with an intended use of "common" if the expert believes it should be | |||
"limited", as it is desirable to keep the number of "common" | "limited", as it is desirable to keep the number of "common" | |||
registrations small and high quality. The second section is reserved | registrations small and high quality. The second section is reserved | |||
for limited use collations. The third section is reserved for | for limited use collations. The third section is reserved for | |||
registered vendor specific collations. The final section is reserved | registered vendor specific collations. The final section is reserved | |||
for deprecated collations. | for deprecated collations. | |||
8.12 Example Initial Registry Summary | 7.4. Example Initial Registry Summary | |||
The following is an example of how IANA might structure the initial | The following is an example of how IANA might structure the initial | |||
registry summary.txt file: | registry summary.txt file: | |||
Collation Functions Scope Reference | Collation Functions Scope Reference | |||
--------- --------- ----- --------- | --------- --------- ----- --------- | |||
Common Use Collations: | Common Use Collations: | |||
i;octet e, o, s Other [RFC XXXX] | ||||
i;nameprep;v=1;uv=3.2 e, o, s i18n [RFC XXXX] | i;nameprep;v=1;uv=3.2 e, o, s i18n [RFC XXXX] | |||
i;basic;uca=3.1.1;uv=3.2 e, o, s i18n [RFC XXXX] | i;ascii-casemap e, o, s Local [RFC XXXX] | |||
i;basic;uca=3.1.1;uv=3.2;match=accent e, o, s i18n [RFC XXXX] | ||||
i;basic;uca=3.1.1;uv=3.2;match=case e, o, s i18n [RFC XXXX] | ||||
en;ascii-casemap e, o, s Local [RFC XXXX] | ||||
Limited Use Collations: | Limited Use Collations: | |||
i;octet e, o, s Other [RFC XXXX] | ||||
i;ascii-numeric e, o Other [RFC XXXX] | i;ascii-numeric e, o Other [RFC XXXX] | |||
Vendor Collations: | Vendor Collations: | |||
Deprecated Collations: | Deprecated Collations: | |||
i;ascii-casemap e, o, s Local [RFC XXXX] | ||||
References | References | |||
---------- | ---------- | |||
[RFC XXXX] Newman, C., "Internet Application Protocol Collation | [RFC XXXX] Newman, C., Duerst, M., Gulbrandsen, A., "Internet | |||
Registry", RFC XXXX, Sun Microsystems, October 2003. | Application Protocol Collation Registry", RFC XXXX, | |||
Sun Microsystems, October 2013. | ||||
9. DTD for Collation Registration | ||||
<!- | 8. Guidelines for Expert Reviewer | |||
DTD for Collation Registration Document | ||||
Data types: | ||||
entity description | ||||
====== =========== | ||||
NUMBER [0-9]+ | ||||
URI As defined in RFC 2396 | ||||
CTEXT printable ASCII text (no line-terminators) | ||||
TEXT character data | ||||
-> | ||||
<!ENTITY % NUMBER "CDATA"> | ||||
<!ENTITY % URI "CDATA"> | ||||
<!ENTITY % CTEXT "#PCDATA"> | ||||
<!ENTITY % TEXT "#PCDATA"> | ||||
<!ELEMENT collation (name,title,functions,specification+,owner+, | ||||
submittor*,version?,UnicodeVersion?, | ||||
UCAVersion?,UCAMatchLevel?)> | ||||
<!ATTLIST collation | ||||
rfc %NUMBER; "0" | ||||
scope (i18n|local|other) #IMPLIED | ||||
intendedUse (common|limited|vendor|deprecated) #IMPLIED> | ||||
<!ELEMENT name (%CTEXT;)> | ||||
<!ELEMENT title (%CTEXT;)> | ||||
<!ELEMENT functions (%CTEXT;)> | ||||
<!ELEMENT specification (%TEXT;)> | ||||
<!ATTLIST specification | ||||
uri %URI; ""> | ||||
<!ELEMENT owner (%CTEXT;)> | ||||
<!ELEMENT submittor (%CTEXT;)> | ||||
<!ELEMENT version (%CTEXT;)> | ||||
<!ELEMENT UnicodeVersion (%CTEXT;)> | ||||
<!ELEMENT UCAVersion (%CTEXT;)> | ||||
<!ELEMENT UCAMatchLevel (%CTEXT;)> | ||||
10. Guidelines for Expert Reviewer | ||||
The expert reviewer appointed by the IESG has fairly broad latitude | The expert reviewer appointed by the IESG has fairly broad latitude | |||
for this registry. While a number of collations are expected | for this registry. While a number of collations are expected | |||
(particularly customizations of the basic collation for localized | (particularly customizations of the basic collation for localized | |||
use), an explosion of collations (particularly common use collations) | use), an explosion of collations (particularly common use collations) | |||
is not desirable for widespread interoperability. However, it is | is not desirable for widespread interoperability. However, it is | |||
important for the expert reviewer to provide cause when rejecting a | important for the expert reviewer to provide cause when rejecting a | |||
registration, and when possible to describe corrective action to | registration, and when possible to describe corrective action to | |||
permit the registration to proceed. The following table includes | permit the registration to proceed. The following table includes | |||
some example reasons to reject a registration with cause: | some example reasons to reject a registration with cause: | |||
o The registration is not a well-formed XML document. | ||||
o The registration is not a well-formed XML document that follows | o The registration has an intended use of "common", but there is no | |||
the DTD. | evidence the collation will be widely deployed, so it should be | |||
o The registration has intended use of "common", but there is no | ||||
evidence the collation will be widely deployed so it should be | ||||
listed as "limited". | listed as "limited". | |||
o The registration has an intended use of "common", but it is | ||||
redundant with the functionality of a previously registered | ||||
"common" collation. | ||||
o The registration has an intended use of "common", but the | ||||
specification is not detailed enough to allow interoperable | ||||
implementations by others. | ||||
o The registration has intended use of "common", but is redundant | o The collation identifier fails to precisely identify the version | |||
with the functionality of a previously registered "common" | numbers of relevant tables to use. | |||
collation. | ||||
o The collation name fails to precisely identify the version numbers | ||||
of relevant tables to use. | ||||
o The registration fails to meet one of the "MUST" requirements in | o The registration fails to meet one of the "MUST" requirements in | |||
Section 4. | Section 4. | |||
o The collation identifier fails to meet the syntax in Section 3. | ||||
o The collation name fails to meet the syntax in Section 3. | ||||
o The collation specification referenced in the registration is | o The collation specification referenced in the registration is | |||
vague or has optional features without a clear behavior specified. | vague or has optional features without a clear behavior specified. | |||
o The referenced specification does not adequately address security | o The referenced specification does not adequately address security | |||
considerations specific to that collation. | considerations specific to that collation. | |||
o The registration's operations are needlessly different from those | ||||
of traditional operations. | ||||
o The registration's XML is needlessly different from that of | ||||
already registered collations. | ||||
9. Initial Collations | ||||
This section describes an initial set of collations for the collation | ||||
registry. | ||||
9.1. ASCII Numeric Collation | ||||
9.1.1. ASCII Numeric Collation Description | ||||
The "i;ascii-numeric" collation is a simple collation intended for | ||||
use with arbitrary sized unsigned decimal integer numbers stored as | ||||
octet strings. US-ASCII digits (0x30 to 0x39) represent digits of | ||||
the numbers. Before converting from string to integer, the input | ||||
string is truncated at the first non-digit character. All input is | ||||
valid; strings which do not start with a digit represent positive | ||||
infinity. | ||||
The collation supports equality and ordering, but does not support | ||||
the substring operation. | ||||
The equality operation returns "match" if the two strings represent | ||||
the same number (ie. leading zeroes and trailing nondigits are | ||||
disregarded) and "no-match" if the two strings represent different | ||||
numbers. | ||||
The ordering operation returns "less" if the first string represents | ||||
a smaller number than the second, "equal" if they represent the same | ||||
number, and "greater" if the first string represents a larger number | ||||
than the second. | ||||
Some examples: "0" is less than "1", and "1" is less than | ||||
"4294967298". "4294967298", "04294967298" and "4294967298b" are all | ||||
equal. "04294967298" is less than "". "", "x" and "y" are equal. | ||||
11. Security Considerations | 9.1.2. ASCII Numeric Collation Registration | |||
<?xml version='1.0'?> | ||||
<!DOCTYPE collation SYSTEM 'collationreg.dtd'> | ||||
<collation rfc="XXXX" scope="other" intendedUse="limited"> | ||||
<identifier>i;ascii-numeric</identifier> | ||||
<title>ASCII Numeric</title> | ||||
<operations>equality order</operations> | ||||
<specification>RFC XXXX</specification> | ||||
<owner>IETF</owner> | ||||
<submitter>chris.newman@sun.com<submitter> | ||||
</collation> | ||||
9.2. ASCII Casemap Collation | ||||
9.2.1. ASCII Casemap Collation Description | ||||
The "i;ascii-casemap" collation is a simple collation which operates | ||||
on octet strings and treats US-ASCII letters case-insensitively. It | ||||
provides equality, substring and ordering operations. All input is | ||||
valid. | ||||
Its equality, ordering and substring operations are as for i;octet, | ||||
except that first, the lower-case letters (octet values 97-122) in | ||||
each input string are changed to upper case (octet values 65-90). | ||||
Care should be taken when using OS-supplied functions to implement | ||||
this collation as it is not locale sensitive. Functions such as | ||||
strcasecmp and toupper are sometimes locale sensitive and may | ||||
inappropriately map lower-case letters other than a-z to upper case. | ||||
The i;ascii-casemap collation is well suited to to use with many | ||||
internet protocols and computer languages. Use with natural language | ||||
is often inappropriate: even though the collation apparently supports | ||||
languages such as Italian and English, in real-world use it tends to | ||||
stumble over words such as "naive", names such as "Llwyd", people and | ||||
place names containing non-ASCII, euro and pound sterling symbols, | ||||
quotation marks, dashes/hyphens, etc. | ||||
9.2.2. ASCII Casemap Collation Registration | ||||
<?xml version='1.0'?> | ||||
<!DOCTYPE collation SYSTEM 'collationreg.dtd'> | ||||
<collation rfc="XXXX" scope="local" intendedUse="common"> | ||||
<identifier>i;ascii-casemap</identifier> | ||||
<title>ASCII Casemap</title> | ||||
<operations>equality order substring</operations> | ||||
<specification>RFC XXXX</specification> | ||||
<owner>IETF</owner> | ||||
<submitter>chris.newman@sun.com<submitter> | ||||
</collation> | ||||
9.3. Nameprep Collation | ||||
9.3.1. Nameprep Collation Description | ||||
The "i;nameprep;v=1;uv=3.2" collation is an implementation of the | ||||
nameprep [7] specification based on normalization tables from Unicode | ||||
version 3.2. This collation applies the nameprep canonicalization | ||||
function to both input strings and then returns the result of the | ||||
i;octet collation on the canonicalized strings. While this collation | ||||
offers all three operations, the ordering operation it provides is | ||||
inadequate for use by the majority of the world. | ||||
Version number 1 is applied to nameprep as specified in RFC 3491. If | ||||
the nameprep specification is revised without any changes that would | ||||
produce different results when given the same pair of input octet | ||||
strings, then the version number need not be changed. | ||||
The table numbers for tables used by nameprep are as follows: | ||||
+--------------+-----------------------+ | ||||
| Table Number | Table Name | | ||||
+--------------+-----------------------+ | ||||
| 1 | UnicodeData-3.2.0.txt | | ||||
| 2 | Table B.1 | | ||||
| 3 | Table B.2 | | ||||
| 4 | Table C.1.2 | | ||||
| 5 | Table C.2.2 | | ||||
| 6 | Table C.3 | | ||||
| 7 | Table C.4 | | ||||
| 8 | Table C.5 | | ||||
| 9 | Table C.6 | | ||||
| 10 | Table C.7 | | ||||
| 11 | Table C.8 | | ||||
| 12 | Table C.9 | | ||||
+--------------+-----------------------+ | ||||
9.3.2. Nameprep Collation Registration | ||||
<?xml version='1.0'?> | ||||
<!DOCTYPE collation SYSTEM 'collationreg.dtd'> | ||||
<collation rfc="XXXX" scope="i18n" intendedUse="common"> | ||||
<identifier>i;nameprep;v=1;uv=3.2</identifier> | ||||
<title>Nameprep</title> | ||||
<operations>equality order substring</operations> | ||||
<specification>RFC XXXX</specification> | ||||
<owner>IETF</owner> | ||||
<submitter>chris.newman@sun.com<submitter> | ||||
<version>1</version> | ||||
</collation> | ||||
9.4. Octet Collation | ||||
9.4.1. Octet Collation Description | ||||
The "i;octet" collation is a simple and fast collation intended for | ||||
use on binary octet strings rather than on character data. Protocols | ||||
that want to make this collation available have to do so by | ||||
explicitly allowing it. If not explicitly allowed, it MUST NOT be | ||||
used. It never returns an "undefined" result. It provides equality, | ||||
substring and ordering operations. | ||||
The ordering algorithm is as follows: | ||||
1. If both strings are the empty string, return the result "equal". | ||||
2. If the first string is empty and the second is not, return the | ||||
result "less". | ||||
3. If the second string is empty and the first is not, return the | ||||
result "greater". | ||||
4. If both strings begin with the same octet value, remove the first | ||||
octet from both strings and repeat this algorithm from step 1. | ||||
5. If the unsigned value (0 to 255) of the first octet of the first | ||||
string is less than the unsigned value of the first octet of the | ||||
second string, then return "less". | ||||
6. If this step is reached, return "greater". | ||||
This algorithm is roughly equivalent to the C library function memcmp | ||||
with appropriate length checks added. | ||||
The matching operation returns "match" if the sorting algorithm would | ||||
return "equal". Otherwise the matching operation returns "no-match". | ||||
The substring operation returns "match" if the first string is the | ||||
empty string, or if there exists a substring of the second string of | ||||
length equal to the length of the first string which would result in | ||||
a "match" result from the equality function. Otherwise the substring | ||||
operation returns "no-match". | ||||
9.4.2. Octet Collation Registration | ||||
This collation is defined with intendedUse="limited" because it can | ||||
only be used by protocols that explicitly allow it. | ||||
<?xml version='1.0'?> | ||||
<!DOCTYPE collation SYSTEM 'collationreg.dtd'> | ||||
<collation rfc="XXXX" scope="i18n" intendedUse="limited"> | ||||
<identifier>i;octet</identifier> | ||||
<title>Octet</title> | ||||
<operations>equality order substring</operations> | ||||
<specification>RFC XXXX</specification> | ||||
<owner>IETF</owner> | ||||
<submitter>chris.newman@sun.com<submitter> | ||||
</collation> | ||||
10. IANA Considerations | ||||
Section 7 defines how to register collations with IANA. Section 9 | ||||
defines a list of predefined collations, which should be registered | ||||
when this document is approved and published as an RFC. | ||||
11. Security Considerations | ||||
Collations will normally be used with UTF-8 strings. Thus the | Collations will normally be used with UTF-8 strings. Thus the | |||
security considerations for UTF-8 [3] and stringprep [6] also apply | security considerations for UTF-8 [3], stringprep [6] and Unicode | |||
and are normative to this specification. | TR-36 [9] also apply and are normative to this specification. | |||
12. Open Issues | 12. Acknowledgements | |||
1. Is any Nameprep processing appropriate for the basic collation? | The authors want to thank all who have contributed to this document, | |||
Because a result of "0" from an ordering algorithm is | including at least John Cowan, Dave Cridland, Mark Davis, Lisa | |||
undesirable, much of the nameprep processing is inappropriate. | Dusseault, Frank Ellermann, Philip Guenther, Tony Hansen, Kjetil | |||
Furthermore, a result of "error" which is important for nameprep | Torgrim Homme, Michael Kay, Alexey Melnikov, Jim Melton and Abhijit | |||
is generally inappropriate as an internal result in an ordering | Menon-Sen. | |||
algorithm since it makes the results less intuitive. The sort | ||||
key table also eliminates most problematic characters from | 13. Open Issues | |||
consideration if the appropriate collation modifier is used. | ||||
Finally, exact compatibility with the Unicode Collation Algorithm | When converting this to an RFC, several things must be done: Martin | |||
is deemed desirable by the author, as even the smallest variation | Duerst's name request, checking for unfortunate page breaks, adding a | |||
may require implementation of largely duplicate code. However, | note to the RFC editor to possibly replace the 3066 reference, | |||
this decision is outside my expertise, so I welcome alternate | checking the SP SP "1" SP SP string for correctness. | |||
viewpoints. | ||||
Why no comments from anyone in the second half of the alphabet? | ||||
2. The ICU implementation of the UCA algorithm includes additional | ||||
algorithmic customizations such as the ability to be | 14. Change Log | |||
case-sensitive while at the same time being insensitive to | ||||
accents. Should these customizations be added to this | 14.1. Changes From -12 | |||
specification? | 1. Remove i;basic, to publish it as a separate RFC. Many documents | |||
are held up by this document, and this document is only help up | ||||
3. Should a format for customization data for the basic collation be | by i;basic. | |||
defined so that disconnected clients might have the option of | 2. Get rid of all the typoes I could find. | |||
downloading that information? | 3. Specifically note that the "same" substring match need not always | |||
be returned in each of its guises. | ||||
4. Need to deal with the concept of "maybe" or "indeterminate" | ||||
results from matching or ordering. See what LDAP does as an | 14.2. Changes From -11 | |||
example. | 1. Remove the DTD. Permit well-considered extension of the XML. | |||
Enable the designated expert to block registrations due to | ||||
inappropriate or overly aggressive extension. | ||||
2. Rename collation names to collation identifiers. Having both | ||||
names and titles wasn't good. | ||||
3. Removed some open issues after trying to edit, and deciding that | ||||
the existing text was good. | ||||
4. Note that in Sieve, invalid strings sort after valid ones. | ||||
5. Make i;ascii-numeric as in RFC2244. The task of this document is | ||||
to establish the registry, not change existing collations. | ||||
14.3. Changes From -10 | ||||
1. Updated contact details for Martin Duerst. | ||||
2. Various textual improvements. | ||||
3. The registration's file name now has a mandatory .xml extension. | ||||
4. Removed binding MUST for Sieve; it's more appropriate to put that | ||||
in 3028bis. | ||||
5. Syntax fix in registration example. | ||||
6. When there are multiple specifications, they now act in concert, | ||||
so it's possible to have e.g. a main specification and multiple | ||||
locale-specific supplements. It is not possible to name multiple | ||||
locations for the same specification any more. That'll return as | ||||
a comment feature. | ||||
7. Hopefully clearer exposition of i;ascii-casemap. | ||||
8. The ban on registering octet-based collations is lifted. One | ||||
hopes that the collation mailing list will present a suitable | ||||
threshold - not too high, not too low. | ||||
9. The DTD is published where IE can see it while looking at the | ||||
registrations. | ||||
14.4. Changes From -09 | ||||
1. Rename "error" to "undefined", as suggested by Mark Davis. The | ||||
new name makes for nicer prose IMO. | ||||
2. 7b=7 according to i;ascii-numeric. ACAP/Sieve need it. | ||||
3. Clarified that even though the collation specification returns a | ||||
list of substrings, the protocol/server need not use all of that | ||||
information. (As indeed IMAP SEARCH does not.) | ||||
4. Registrations go directly to the collation list _and_ to the | ||||
IANA, not to the IANA and from there forwarded to designated | ||||
expert. | ||||
5. Added an acknowledgements list and populated it with a quick grep | ||||
from my mailbox and memory. Surely incomplete. | ||||
6. Noted that in sieve, "no-match" and "undefined" must be treated | ||||
in the same way by the engine. | ||||
7. Finish the rename from canonical to sort key. | ||||
8. Don't fall back to i;octet from any other collation. Return | ||||
undefined instead. Note that protocols may fall back to i;octet | ||||
to provide total ordering, if necessary. | ||||
9. Call the things operations everywhere, not operators/operations. | ||||
14.5. Changes From -08 | ||||
1. i;ascii-casemap instead of en;ascii-casemap. | ||||
2. UCA v 14. Changing to "latest version of UCA" was suggested, | ||||
but rejected since IETF standards reference stable | ||||
specifications, and "latest" is a moving target. | ||||
3. Removed all text on multi-valued attributes. Can be added once | ||||
there is a concrete need for it, either in an update to this | ||||
document or in the protocol that needs it. | ||||
4. "Collations MUST specify the canonicalization". Well, the UCA | ||||
doesn't, so I changed that to a MAY. | ||||
5. Add some text explaining why one might want to download tables. | ||||
6. Changed the remaining instances of "canonicalization" to talk | ||||
about sort keys. Added a note that a collation's sort key need | ||||
not be valid input to the same collation. | ||||
7. Reserve the word "default" and use it to name a protocol's | ||||
default collation, provided that protocol has a default | ||||
collation. In earlier versions of the draft, "*" was used to | ||||
name the default collation, but "*" also was implicitly defined | ||||
as the most general collation available. | ||||
8. Reinstate the different-length example of substring match. | ||||
Explain what an overlapping match is, by the canonical example. | ||||
9. Avoid the word "contain" when talking about substring matches. | ||||
Fewer terms is better. | ||||
10. Until -07, both a collation and equality/substring/sort was | ||||
called functions. In -07, the trio was renamed as operations. | ||||
Now, the DTD is updated to match. | ||||
11. Appeals go to the Apps AD before the general AD, as suggested by | ||||
Spencer Dawkins. | ||||
14.6. Changes From -06 | ||||
1. Clarified equality and identity: equality is as defined by a | ||||
collation, identity is stronger. | ||||
2. Added reference to | ||||
http://www.unicode.org/reports/tr10/#Searching. | ||||
3. Don't describe sort keys as a canonical representation of the | ||||
string. | ||||
4. Permit disconnected clients to use wildcards. (A disconnected | ||||
client has to resolve the wildcard itself, in the same way that a | ||||
server would.) | ||||
5. Change collation-wild to have the same length limit as collation. | ||||
6. Change to use "less" instead of "-1", etc., and specify that it's | ||||
just phrasing, not specification. | ||||
7. Don't describe the equality, substring and ordering operations as | ||||
functions. The definition of collation uses the word function | ||||
about the collation itself. A function that has three functions? | ||||
Something has to give. | ||||
8. Strike a requirement that selecting '*' is the same as not | ||||
selecting any collation. It restricted the protocol's default | ||||
too much. Existing code wasn't listening. | ||||
9. Left out the canonicalization/sort keys. | ||||
14.7. Changes From -05 | ||||
1. Added definitions of client, server and protocol, and prose to | ||||
specify that while the IANA registrations of collations are | ||||
written in terms octet strings, implementations may do it | ||||
differently. | ||||
2. Changed the wording for ascii-numeric to treat the numbers as | ||||
numbers, etc. | ||||
3. Added explicit property requirements for the three functions, | ||||
e.g. that equality be symmetric. Added requirements that the | ||||
three functions be consistent, and that if any operations are | ||||
present, equality must be (needed for consistency). | ||||
4. Random editing, e.g. changing 'numbers' for ascii-numeric to | ||||
'integer numbers'. | ||||
5. Gave IMAP/SORT/COMPARATOR the same grandfather treatment as ACAP | ||||
and SIEVE. | ||||
14.8. Changes From -04 | ||||
Grammar and clarity changes only. One (weak) example added. No | ||||
substantive changes. | ||||
14.9. Changes From -03 | ||||
(This does not include all changes made.) | ||||
1. Checked and resolved most issues marked 'check whether this is | ||||
true' or similar. | ||||
2. Resolved nameprep issue: No. | ||||
3. Removed NULL for compatibility with existing collations (IMAP | ||||
SORT, Sieve). | ||||
4. There can be multiple owners and submitters. Say how. | ||||
5. Added a requirement that common collations must now be | ||||
interoperable. Insufficiently detailed specs cannot be "common". | ||||
6. Added a guideline that the operations provided by new collations | ||||
should be reminiscent of similar operations on existing | ||||
collations. | ||||
14.10. Changes From -02 | ||||
1. Changed from data being octet sequences (in UTF-8) to data being | ||||
character sequences (with octet collation as an exception). | ||||
2. Made XML format description much more structured. | ||||
3. Changed <submittor> to <submitter>, because this spelling is much | ||||
more common. | ||||
4. Defined 'protocol' to include query languages. | ||||
5. Reorganized document, in particular IANA considerations section | ||||
(which newly is just a list of pointers). | ||||
6. Added subsections, and a 'Structure of this Document' section. | ||||
7. Updated references. | ||||
8. Created a 'Change Log' chapter, with sections for each draft. | ||||
9. Reduced 'Open issues' section, open issues are now maintained at | ||||
http://www.w3.org/2004/08/ietf-collation. | ||||
13. Changes From -00 | 14.11. Changes From -01 | |||
Add IANA comment to open issues. Otherwise this is just a re-publish | ||||
to keep the document alive. | ||||
14.12. Changes From -00 | ||||
1. Replaced the term comparator with collation. While comparator is | 1. Replaced the term comparator with collation. While comparator is | |||
somewhat more precise because these abstract functions are used | somewhat more precise because these abstract functions are used | |||
for matching as well as ordering, collation is the term used by | for matching as well as ordering, collation is the term used by | |||
other parts of the industry. Thus I have changed the name to | other parts of the industry. Thus I have changed the name to | |||
collation for consistency. | collation for consistency. | |||
2. Remove all modifiers to the basic collation except for the | 2. Remove all modifiers to the basic collation except for the | |||
customization and the match rules. The other behavior | customization and the match rules. The other behavior | |||
modifications can be specified in a customization of the | modifications can be specified in a customization of the | |||
collation. | collation. | |||
3. Use ";" instead of "-" as delimiter between parameters to make | 3. Use ";" instead of "-" as delimiter between parameters to make | |||
names more URL-ish. | names more URL-ish. | |||
4. Add URL form for comparator reference. | 4. Add URL form for comparator reference. | |||
5. Switched registration template to use XML document. | 5. Switched registration template to use XML document. | |||
6. Added a number of useful registration template elements related | 6. Added a number of useful registration template elements related | |||
to the Unicode Collation Algorithm. | to the Unicode Collation Algorithm. | |||
7. Switched language from "custom" to "tailor" to match UCA language | 7. Switched language from "custom" to "tailor" to match UCA language | |||
for tailoring of the collation algorithm. | for tailoring of the collation algorithm. | |||
Normative References | 15. References | |||
15.1. Normative References | ||||
[1] Bradner, S., "Key words for use in RFCs to Indicate Requirement | [1] Bradner, S., "Key words for use in RFCs to Indicate Requirement | |||
Levels", BCP 14, RFC 2119, March 1997. | Levels", BCP 14, RFC 2119, March 1997. | |||
[2] Crocker, D. and P. Overell, "Augmented BNF for Syntax | [2] Crocker, D. and P. Overell, "Augmented BNF for Syntax | |||
Specifications: ABNF", RFC 2234, November 1997. | Specifications: ABNF", RFC 4234, October 2005. | |||
[3] Yergeau, F., "UTF-8, a transformation format of ISO 10646", RFC | [3] Yergeau, F., "UTF-8, a transformation format of ISO 10646", | |||
2279, January 1998. | STD 63, RFC 3629, November 2003. | |||
[4] Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform Resource | [4] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform | |||
Identifiers (URI): Generic Syntax", RFC 2396, August 1998. | Resource Identifier (URI): Generic Syntax", RFC 3986, | |||
January 2005. | ||||
[5] Alvestrand, H., "Tags for the Identification of Languages", BCP | [5] Alvestrand, H., "Tags for the Identification of Languages", | |||
47, RFC 3066, January 2001. | BCP 47, RFC 3066, January 2001. | |||
[6] Hoffman, P. and M. Blanchet, "Preparation of Internationalized | [6] Hoffman, P. and M. Blanchet, "Preparation of Internationalized | |||
Strings ("stringprep")", RFC 3454, December 2002. | Strings ("stringprep")", RFC 3454, December 2002. | |||
[7] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep Profile for | [7] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep Profile for | |||
Internationalized Domain Names (IDN)", RFC 3491, March 2003. | Internationalized Domain Names (IDN)", RFC 3491, March 2003. | |||
[8] Davis, M. and K. Whistler, "Unicode Collation Algorithm version | [8] Davis, M. and K. Whistler, "Unicode Collation Algorithm version | |||
9", July 2002, <http://www.unicode.org/reports/tr10/ | 14", May 2005, | |||
tr10-9.html>. | <http://www.unicode.org/reports/tr10/tr10-14.html>. | |||
[9] Davis, M. and M. Suignard, "Unicode Security Considerations", | ||||
February 2006, <http://www.unicode.org/reports/tr36/>. | ||||
Informative References | 15.2. Informative References | |||
[9] Freed, N. and N. Borenstein, "Multipurpose Internet Mail | [10] Freed, N. and N. Borenstein, "Multipurpose Internet Mail | |||
Extensions (MIME) Part One: Format of Internet Message Bodies", | Extensions (MIME) Part One: Format of Internet Message Bodies", | |||
RFC 2045, November 1996. | RFC 2045, November 1996. | |||
[10] Myers, J., "Simple Authentication and Security Layer (SASL)", | [11] Myers, J., "Simple Authentication and Security Layer (SASL)", | |||
RFC 2222, October 1997. | RFC 2222, October 1997. | |||
[11] Newman, C. and J. Myers, "ACAP -- Application Configuration | [12] Newman, C. and J. Myers, "ACAP -- Application Configuration | |||
Access Protocol", RFC 2244, November 1997. | Access Protocol", RFC 2244, November 1997. | |||
[12] Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA | ||||
Considerations Section in RFCs", BCP 26, RFC 2434, October | ||||
1998. | ||||
[13] Resnick, P., "Internet Message Format", RFC 2822, April 2001. | [13] Resnick, P., "Internet Message Format", RFC 2822, April 2001. | |||
[14] Freed, N. and J. Postel, "IANA Charset Registration | [14] Freed, N. and J. Postel, "IANA Charset Registration | |||
Procedures", BCP 19, RFC 2978, October 2000. | Procedures", BCP 19, RFC 2978, October 2000. | |||
[15] Showalter, T., "Sieve: A Mail Filtering Language", RFC 3028, | [15] Showalter, T., "Sieve: A Mail Filtering Language", RFC 3028, | |||
January 2001. | January 2001. | |||
URIs | [16] Crispin, M., "Internet Message Access Protocol - Version | |||
4rev1", RFC 3501, March 2003. | ||||
[16] <http://www.unicode.org/Public/3.2-Update/ | [17] Crispin, M. and K. Murchison, "Internet Message Access Protocol | |||
UnicodeData-3.2.0.txt> | - Sort and Thread Extensions", draft-ietf-imapext-sort-17.txt | |||
(work in progress), May 2004. | ||||
[18] Newman, C. and A. Gulbrandsen, "Internet Message Access | ||||
Protocol Internationalization", draft-ietf-imapext-i18n-06.txt | ||||
(work in progress), January 2006. | ||||
[17] <http://www.unicode.org/Public/3.2-Update/PropList-3.2.0.txt> | Authors' Addresses | |||
[18] <http://www.unicode.org/reports/tr10/allkeys-3.1.1.txt> | ||||
Author's Address | ||||
Chris Newman | Chris Newman | |||
Sun Microsystems | Sun Microsystems | |||
1050 Lakes Drive | 1050 Lakes Drive | |||
West Covina, CA 91790 | West Covina, CA 91790 | |||
US | US | |||
EMail: chris.newman@sun.com | Email: chris.newman@sun.com | |||
Martin Duerst (Note: Please write "Duerst" with u-umlaut wherever possib | ||||
le, for example as "Dürst" in XML and HTML.) | ||||
Aoyama Gakuin University | ||||
5-10-1 Fuchinobe | ||||
Sagamihara, Kanagawa 229-8558 | ||||
Japan | ||||
Phone: +81 42 759 6329 | ||||
Fax: +81 42 759 6495 | ||||
Email: mailto:duerst@it.aoyama.ac.jp | ||||
URI: http://www.sw.it.aoyama.ac.jp/D%C3%BCrst/ | ||||
Arnt Gulbrandsen | ||||
Oryx Mail Systems GmbH | ||||
Schweppermannstr. 8 | ||||
Munich 81671 | ||||
Germany | ||||
Phone: +49 89 4502 9757 | ||||
Fax: +49 89 4502 9758 | ||||
Email: mailto:arnt@oryx.com | ||||
URI: http://www.oryx.com/arnt/ | ||||
Intellectual Property Statement | Intellectual Property Statement | |||
The IETF takes no position regarding the validity or scope of any | The IETF takes no position regarding the validity or scope of any | |||
intellectual property or other rights that might be claimed to | Intellectual Property Rights or other rights that might be claimed to | |||
pertain to the implementation or use of the technology described in | pertain to the implementation or use of the technology described in | |||
this document or the extent to which any license under such rights | this document or the extent to which any license under such rights | |||
might or might not be available; neither does it represent that it | might or might not be available; nor does it represent that it has | |||
has made any effort to identify any such rights. Information on the | made any independent effort to identify any such rights. Information | |||
IETF's procedures with respect to rights in standards-track and | on the procedures with respect to rights in RFC documents can be | |||
standards-related documentation can be found in BCP-11. Copies of | found in BCP 78 and BCP 79. | |||
claims of rights made available for publication and any assurances of | ||||
licenses to be made available, or the result of an attempt made to | Copies of IPR disclosures made to the IETF Secretariat and any | |||
obtain a general license or permission for the use of such | assurances of licenses to be made available, or the result of an | |||
proprietary rights by implementors or users of this specification can | attempt made to obtain a general license or permission for the use of | |||
be obtained from the IETF Secretariat. | such proprietary rights by implementers or users of this | |||
specification can be obtained from the IETF on-line IPR repository at | ||||
http://www.ietf.org/ipr. | ||||
The IETF invites any interested party to bring to its attention any | The IETF invites any interested party to bring to its attention any | |||
copyrights, patents or patent applications, or other proprietary | copyrights, patents or patent applications, or other proprietary | |||
rights which may cover technology that may be required to practice | rights that may cover technology that may be required to implement | |||
this standard. Please address the information to the IETF Executive | this standard. Please address the information to the IETF at | |||
Director. | ietf-ipr@ietf.org. | |||
Full Copyright Statement | Disclaimer of Validity | |||
Copyright (C) The Internet Society (2003). All Rights Reserved. | This document and the information contained herein are provided on an | |||
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS | ||||
This document and translations of it may be copied and furnished to | OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET | |||
others, and derivative works that comment on or otherwise explain it | ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, | |||
or assist in its implementation may be prepared, copied, published | INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE | |||
and distributed, in whole or in part, without restriction of any | INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED | |||
kind, provided that the above copyright notice and this paragraph are | WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. | |||
included on all such copies and derivative works. However, this | ||||
document itself may not be modified in any way, such as by removing | Copyright Statement | |||
the copyright notice or references to the Internet Society or other | ||||
Internet organizations, except as needed for the purpose of | Copyright (C) The Internet Society (2006). This document is subject | |||
developing Internet standards in which case the procedures for | to the rights, licenses and restrictions contained in BCP 78, and | |||
copyrights defined in the Internet Standards process must be | except as set forth therein, the authors retain all their rights. | |||
followed, or as required to translate it into languages other than | ||||
English. | ||||
The limited permissions granted above are perpetual and will not be | ||||
revoked by the Internet Society or its successors or assignees. | ||||
This document and the information contained herein is provided on an | ||||
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING | ||||
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING | ||||
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION | ||||
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF | ||||
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. | ||||
Acknowledgment | Acknowledgment | |||
Funding for the RFC Editor function is currently provided by the | Funding for the RFC Editor function is currently provided by the | |||
Internet Society. | Internet Society. | |||
End of changes. 130 change blocks. | ||||
819 lines changed or deleted | 1082 lines changed or added | |||
This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |