Diff: compare.txt - compare.txt

	compare.txt	compare.txt


	Network Working Group C. Newman	Network Working Group C. Newman
	Internet-Draft Sun Microsystems	Internet-Draft Sun Microsystems

	Expires: April 26, 2004 October 27, 2003	Expires: February 2, 2007 M. Duerst
		AGU
		A. Gulbrandsen
		Oryx
		August 1, 2006

	Internet Application Protocol Collation Registry	Internet Application Protocol Collation Registry

	draft-newman-i18n-comparator-01.txt	draft-newman-i18n-comparator-13.txt

	Status of this Memo	Status of this Memo


	This document is an Internet-Draft and is in full conformance with	By submitting this Internet-Draft, each author represents that any
	all provisions of Section 10 of RFC2026.	applicable patent or other IPR claims of which he or she is aware
		have been or will be disclosed, and any of which he or she becomes
		aware will be disclosed, in accordance with Section 6 of BCP 79.

	Internet-Drafts are working documents of the Internet Engineering	Internet-Drafts are working documents of the Internet Engineering

	Task Force (IETF), its areas, and its working groups. Note that other	Task Force (IETF), its areas, and its working groups. Note that
	groups may also distribute working documents as Internet-Drafts.	other groups may also distribute working documents as Internet-
		Drafts.

	Internet-Drafts are draft documents valid for a maximum of six months	Internet-Drafts are draft documents valid for a maximum of six months
	and may be updated, replaced, or obsoleted by other documents at any	and may be updated, replaced, or obsoleted by other documents at any

	time. It is inappropriate to use Internet-Drafts as reference	time. It is inappropriate to use Internet-Drafts as reference
	material or to cite them other than as "work in progress."	material or to cite them other than as "work in progress."


	The list of current Internet-Drafts can be accessed at http://	The list of current Internet-Drafts can be accessed at
	www.ietf.org/ietf/1id-abstracts.txt.	http://www.ietf.org/ietf/1id-abstracts.txt.

	The list of Internet-Draft Shadow Directories can be accessed at	The list of Internet-Draft Shadow Directories can be accessed at
	http://www.ietf.org/shadow.html.	http://www.ietf.org/shadow.html.


	This Internet-Draft will expire on April 26, 2004.	This Internet-Draft will expire on February 2, 2007.

	Copyright Notice	Copyright Notice


	Copyright (C) The Internet Society (2003). All Rights Reserved.	Copyright (C) The Internet Society (2006).

	Abstract	Abstract

	Many Internet application protocols include string-based lookup,	Many Internet application protocols include string-based lookup,
	searching, or sorting operations. However the problem space for	searching, or sorting operations. However the problem space for
	searching and sorting international strings is large, not fully	searching and sorting international strings is large, not fully
	explored, and is outside the area of expertise for the Internet	explored, and is outside the area of expertise for the Internet
	Engineering Task Force (IETF). Rather than attempt to solve such a	Engineering Task Force (IETF). Rather than attempt to solve such a
	large problem, this specification creates an abstraction framework so	large problem, this specification creates an abstraction framework so
	that application protocols can precisely identify a comparison	that application protocols can precisely identify a comparison
	function and the repertoire of comparison functions can be extended	function and the repertoire of comparison functions can be extended
	in the future.	in the future.

	Table of Contents	Table of Contents


	1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3	1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
	1.1 Conventions Used in this Document . . . . . . . . . . . . . 3	1.1. Conventions Used in this Document . . . . . . . . . . . . 4
	2. Collation Definition and Purpose . . . . . . . . . . . . . . 3	2. Collation Definition and Purpose . . . . . . . . . . . . . . . 4
	3. Collation Name Syntax . . . . . . . . . . . . . . . . . . . 4	2.1. Definition . . . . . . . . . . . . . . . . . . . . . . . 4
	4. Collation Specification Requirements . . . . . . . . . . . . 6	2.2. Purpose . . . . . . . . . . . . . . . . . . . . . . . . . 4
	5. Application Protocol Requirements . . . . . . . . . . . . . 8	2.3. Some Other Terms Used in this Document . . . . . . . . . 5
	6. Initial Collations . . . . . . . . . . . . . . . . . . . . . 9	2.4. Sort Keys . . . . . . . . . . . . . . . . . . . . . . . . 5
	6.1 Octet Collation . . . . . . . . . . . . . . . . . . . . . . 9	3. Collation Identifier Syntax . . . . . . . . . . . . . . . . . 6
	6.2 ASCII Numeric Collation . . . . . . . . . . . . . . . . . . 10	3.1. Basic Syntax . . . . . . . . . . . . . . . . . . . . . . 6
	6.3 ASCII Casemap Collation . . . . . . . . . . . . . . . . . . 10	3.2. Wildcards . . . . . . . . . . . . . . . . . . . . . . . . 6
	6.4 Nameprep Collation . . . . . . . . . . . . . . . . . . . . . 11	3.3. Ordering Direction . . . . . . . . . . . . . . . . . . . 6
	6.5 Basic Collation . . . . . . . . . . . . . . . . . . . . . . 12	3.4. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 7
	7. Use by ACAP and Sieve . . . . . . . . . . . . . . . . . . . 14	3.5. Naming Guidelines . . . . . . . . . . . . . . . . . . . . 7
	8. IANA Considerations . . . . . . . . . . . . . . . . . . . . 14	4. Collation Specification Requirements . . . . . . . . . . . . . 8
	8.1 Collation Registration Procedure . . . . . . . . . . . . . . 14	4.1. Collation/Server Interface . . . . . . . . . . . . . . . 8
	8.2 Collation Registration Template . . . . . . . . . . . . . . 15	4.2. Operations Supported . . . . . . . . . . . . . . . . . . 8
	8.3 Octet Collation Registration . . . . . . . . . . . . . . . . 16	4.2.1. Validity . . . . . . . . . . . . . . . . . . . . . . . 8
	8.4 ASCII Numeric Collation Registration . . . . . . . . . . . . 16	4.2.2. Equality . . . . . . . . . . . . . . . . . . . . . . . 9
	8.5 Legacy English Casemap Collation Registration . . . . . . . 16	4.2.3. Substring . . . . . . . . . . . . . . . . . . . . . . 9
	8.6 English Casemap Collation Registration . . . . . . . . . . . 16	4.2.4. Ordering . . . . . . . . . . . . . . . . . . . . . . . 10
	8.7 Nameprep Collation Registration . . . . . . . . . . . . . . 17	4.3. Sort Keys . . . . . . . . . . . . . . . . . . . . . . . . 10
	8.8 Basic Collation Registration . . . . . . . . . . . . . . . . 17	4.4. Use of Lookup Tables . . . . . . . . . . . . . . . . . . 10
	8.9 Basic Accent Sensitive Match Collation Registration . . . . 17	5. Application Protocol Requirements . . . . . . . . . . . . . . 11
	8.10 Basic Case Sensitive Match Collation Registration . . . . . 18	5.1. Character Encoding . . . . . . . . . . . . . . . . . . . 11
	8.11 Structure of Collation Registry . . . . . . . . . . . . . . 18	5.2. Operations . . . . . . . . . . . . . . . . . . . . . . . 11
	8.12 Example Initial Registry Summary . . . . . . . . . . . . . . 19	5.3. Wildcards . . . . . . . . . . . . . . . . . . . . . . . . 12
	9. DTD for Collation Registration . . . . . . . . . . . . . . . 19	5.4. Canonicalization Function . . . . . . . . . . . . . . . . 12
	10. Guidelines for Expert Reviewer . . . . . . . . . . . . . . . 20	5.5. Disconnected Clients . . . . . . . . . . . . . . . . . . 12
	11. Security Considerations . . . . . . . . . . . . . . . . . . 21	5.6. Error Codes . . . . . . . . . . . . . . . . . . . . . . . 12
	12. Open Issues . . . . . . . . . . . . . . . . . . . . . . . . 21	5.7. Octet Collation . . . . . . . . . . . . . . . . . . . . . 13
	13. Changes From -00 . . . . . . . . . . . . . . . . . . . . . . 22	6. Use by Existing Protocols . . . . . . . . . . . . . . . . . . 13
	Normative References . . . . . . . . . . . . . . . . . . . . 22	7. Collation Registration . . . . . . . . . . . . . . . . . . . . 13
	Informative References . . . . . . . . . . . . . . . . . . . 23	7.1. Collation Registration Procedure . . . . . . . . . . . . 13
	Author's Address . . . . . . . . . . . . . . . . . . . . . . 24	7.2. Collation Registration Format . . . . . . . . . . . . . . 14
	Intellectual Property and Copyright Statements . . . . . . . 25	7.2.1. Registration Template . . . . . . . . . . . . . . . . 14
		7.2.2. The collation Element . . . . . . . . . . . . . . . . 15
		7.2.3. The identifier Element . . . . . . . . . . . . . . . . 15
		7.2.4. The title Element . . . . . . . . . . . . . . . . . . 15
		7.2.5. The operations Element . . . . . . . . . . . . . . . . 15
		7.2.6. The specification Element . . . . . . . . . . . . . . 15
		7.2.7. The submitter Element . . . . . . . . . . . . . . . . 16
		7.2.8. The owner Element . . . . . . . . . . . . . . . . . . 16
		7.2.9. The version Element . . . . . . . . . . . . . . . . . 16
		7.2.10. The variable Element . . . . . . . . . . . . . . . . . 16
		7.2.11. The name Element . . . . . . . . . . . . . . . . . . . 16
		7.2.12. The default Element . . . . . . . . . . . . . . . . . 16
		7.2.13. The value Element . . . . . . . . . . . . . . . . . . 17
		7.3. Structure of Collation Registry . . . . . . . . . . . . . 17
		7.4. Example Initial Registry Summary . . . . . . . . . . . . 18
		8. Guidelines for Expert Reviewer . . . . . . . . . . . . . . . . 18
		9. Initial Collations . . . . . . . . . . . . . . . . . . . . . . 19
		9.1. ASCII Numeric Collation . . . . . . . . . . . . . . . . . 19
		9.1.1. ASCII Numeric Collation Description . . . . . . . . . 19
		9.1.2. ASCII Numeric Collation Registration . . . . . . . . . 20
		9.2. ASCII Casemap Collation . . . . . . . . . . . . . . . . . 20
		9.2.1. ASCII Casemap Collation Description . . . . . . . . . 20
		9.2.2. ASCII Casemap Collation Registration . . . . . . . . . 21
		9.3. Nameprep Collation . . . . . . . . . . . . . . . . . . . 21
		9.3.1. Nameprep Collation Description . . . . . . . . . . . . 21
		9.3.2. Nameprep Collation Registration . . . . . . . . . . . 22
		9.4. Octet Collation . . . . . . . . . . . . . . . . . . . . . 22
		9.4.1. Octet Collation Description . . . . . . . . . . . . . 22
		9.4.2. Octet Collation Registration . . . . . . . . . . . . . 23
		10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 23
		11. Security Considerations . . . . . . . . . . . . . . . . . . . 23
		12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 23
		13. Open Issues . . . . . . . . . . . . . . . . . . . . . . . . . 23
		14. Change Log . . . . . . . . . . . . . . . . . . . . . . . . . . 24
		14.1. Changes From -12 . . . . . . . . . . . . . . . . . . . . 24
		14.2. Changes From -11 . . . . . . . . . . . . . . . . . . . . 24
		14.3. Changes From -10 . . . . . . . . . . . . . . . . . . . . 24
		14.4. Changes From -09 . . . . . . . . . . . . . . . . . . . . 24
		14.5. Changes From -08 . . . . . . . . . . . . . . . . . . . . 25
		14.6. Changes From -06 . . . . . . . . . . . . . . . . . . . . 26
		14.7. Changes From -05 . . . . . . . . . . . . . . . . . . . . 26
		14.8. Changes From -04 . . . . . . . . . . . . . . . . . . . . 26
		14.9. Changes From -03 . . . . . . . . . . . . . . . . . . . . 26
		14.10. Changes From -02 . . . . . . . . . . . . . . . . . . . . 27
		14.11. Changes From -01 . . . . . . . . . . . . . . . . . . . . 27
		14.12. Changes From -00 . . . . . . . . . . . . . . . . . . . . 27
		15. References . . . . . . . . . . . . . . . . . . . . . . . . . . 28
		15.1. Normative References . . . . . . . . . . . . . . . . . . 28
		15.2. Informative References . . . . . . . . . . . . . . . . . 28
		Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 30
		Intellectual Property and Copyright Statements . . . . . . . . . . 31


	1. Introduction	1. Introduction


	The ACAP [11] specification introduced the concept of a comparator	The ACAP [12] specification introduced the concept of a comparator
	(which we call collation in this document), but failed to create an	(which we call collation in this document), but failed to create an
	IANA registry. With the introduction of stringprep [6] and the	IANA registry. With the introduction of stringprep [6] and the
	Unicode Collation Algorithm [8], it is now time to create that	Unicode Collation Algorithm [8], it is now time to create that
	registry and populate it with some initial values appropriate for an	registry and populate it with some initial values appropriate for an
	international community. This specification replaces and generalizes	international community. This specification replaces and generalizes
	the definition of a comparator in ACAP and creates a collation	the definition of a comparator in ACAP and creates a collation
	registry.	registry.


	1.1 Conventions Used in this Document	1.1. Conventions Used in this Document

	The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", and "MAY"	The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", and "MAY"
	in this document are to be interpreted as defined in "Key words for	in this document are to be interpreted as defined in "Key words for
	use in RFCs to Indicate Requirement Levels" [1].	use in RFCs to Indicate Requirement Levels" [1].

	The attribute syntax specifications use the Augmented Backus-Naur	The attribute syntax specifications use the Augmented Backus-Naur
	Form (ABNF) [2] notation including the core rules defined in Appendix	Form (ABNF) [2] notation including the core rules defined in Appendix

	A. This also inherits ABNF rules from Language Tags [5].	A. This also inherits ABNF rules from Language Tags [5].

		2. Collation Definition and Purpose


	2. Collation Definition and Purpose	2.1. Definition

	A collation is a named function which takes two arbitrary length	A collation is a named function which takes two arbitrary length

	octet strings (encoded in UTF-8 [3] for collations which operate on	strings as input and can be used to perform one or more of three
	characters) as input and can be used to perform one or more of three	basic comparison operations: equality test, substring match, and
	basic comparison operations: equality test, substring match and
	ordering test.	ordering test.


	Collations provide a multi-protocol abstraction layer for comparison	2.2. Purpose
	functions so the details of a particular comparison operation can be
	specified by someone with appropriate expertise independent of the
	application protocol that consumes that collation. This is similar
	to the way a charset [14] separates the details of octet to character
	mapping from a protocol specification such as MIME [9] or the way
	SASL [10] separates the details of an authentication mechanism from a
	protocol specification such as ACAP [11].


	Here a small diagram to help illustrate the value of this abstraction	Collations abstraction layer for comparison functions so that these
	layer:	comparison functions can be used in multiple protocols. The details
		of a particular comparison operation can be specified by someone with
	+-----------------+	appropriate expertise independent of the application protocols that
	\| Octet \|	use that collation. This is similar to the way a charset [14]
	+-------------------+ +--\| Collation Spec \|	separates the details of octet to character mapping from a protocol
	\| IMAP i18n SEARCH \|--+ \| +-----------------+	specification such as MIME [10] or the way SASL [11] separates the
		details of an authentication mechanism from a protocol specification
		such as ACAP [12].

		Here is a small diagram to help illustrate the value of this
		abstraction layer:

		+-------------------+ +-----------------+
		\| IMAP i18n SEARCH \|--+ \| Basic \|
		+-------------------+ \| +--\| Collation Spec \|
		\| \| +-----------------+
	+-------------------+ \| +-------------+ \| +-----------------+	+-------------------+ \| +-------------+ \| +-----------------+

	+--\| Collation \|--+--\| A stringprep \|	\| ACAP i18n SEARCH \|--+--\| Collation \|--+--\| A stringprep \|
	+-------------------+ \| \| Registry \| \| \| Collation Spec \|	+-------------------+ \| \| Registry \| \| \| Collation Spec \|

	\| ACAP i18n SEARCH \|--+ +-------------+ \| +-----------------+	\| +-------------+ \| +-----------------+
	+-------------------+ \| +-----------------+	+-------------------+ \| \| +-----------------+
	\| \| locale-specific \|	\| ...other protocol \|--+ \| \| locale-specific \|
	+--\| Collation Spec \|	+-------------------+ +--\| Collation Spec \|
	+-----------------+	+-----------------+

	Thus IMAP, ACAP and future application protocols with international	Thus IMAP, ACAP and future application protocols with international
	search capability simply specify how to interface to the collation	search capability simply specify how to interface to the collation

	registry instead of each protocol spec having to specify all the	registry instead of each protocol specification having to specify all
	collations it supports.	the collations it supports.

		2.3. Some Other Terms Used in this Document

		The terms client, server and protocol are used in somewhat unusual
		senses.

		Client means a user, or a program acting directly on behalf of a
		user. This may be an mail reader acting as an IMAP client, or it may
		be an interactive shell where the user can type protocol directly, or
		it may be a script or program written by the user.

		Server means a program that performs services requested by the
		client. This may be a traditional server such as an HTTP server, or
		it may be a Sieve [15] interpreter running a Sieve script written by
		a user. A server needs to use the operations provided by collations
		in order to fulfil the client's requests.

		The protocol describes how the client tells the server what it wants
		done, and (if applicable) how the server tells the client about the
		results. IMAP is a protocol by this definition, and so is the Sieve
		language.

		2.4. Sort Keys

		One component of a collation is a transformation which turns a string
		into a sort key, which is then used while sorting.

		The transformation can range from an identity mapping (e.g., the
		i;octet collation Section 9.4) to a mapping which makes the string
		unreadable to a human.


	One component of a collation is a canonicalization function which can	This is an implementation detail of collations or servers. A
	be pre-applied to single strings and may enhance the performance of	protocol SHOULD NOT expose it, since some collations leave the sort
	subsequent comparison operations. Normally, this is an	key's format up to the implementation, and current conformant
	implementation detail of collations, but at times it may be useful	implementations are known to use different formats.
	for an application protocol to expose collation canonicalization over
	protocol. Collation canonicalization can range from an identity	3. Collation Identifier Syntax
	mapping (e.g., the i;octet collation) to a mapping which makes the
	string unreadable to a human (e.g., the basic collation).	3.1. Basic Syntax

	3. Collation Name Syntax	The collation identifier itself is a single US-ASCII string beginning
		with a letter and made up of letters, digits, and one of the
	The collation name itself is a single US-ASCII string beginning with	following 4 symbols: "-", ";", "=" and ".". The identifier MUST NOT
	a letter and made up of letters, digits, or one of the following 4	be longer than 254 characters.
	symbols: "-", ";", "=" or ".". The name MUST NOT be longer than 254
	characters.

	collation-char = ALPHA / DIGIT / "-" / ";" / "=" / "."	collation-char = ALPHA / DIGIT / "-" / ";" / "=" / "."


	collation-name = ALPHA *253collation-char	collation-id = ALPHA *253collation-char


	The string a client uses to select a collation MAY contain a wildcard	The identifier "default" is reserved. For protocol which have a
	("*") character which matches zero or more collation-chars. Wildcard	default collation, "default" refers to that collation. For other
	characters MUST NOT be adjacent. Clients which support disconnected	protocols, the identifier "default" matches no collations, and
	operation SHOULD NOT use wildcards to select a collation, but clients	servers SHOULD treat it in the same way as they treat nonexistent
	which provide collation operations only when connected to the server	collations.
	MAY use wildcards. If the wildcard string matches multiple
	collations, the server SHOULD select the collation with the broadest	3.2. Wildcards
	scope (preferably international scope), the most recent table
	versions and the greatest number of supported operations. A single	The string a client uses to select a collation MAY contain one or
	wildcard character ("*") refers to the application protocol collation	more wildcard ("*") character which matches zero or more collation-
	behavior that would occur if no explicit negotiation were used.	chars. Wildcard characters MUST NOT be adjacent. If the wildcard
		string matches multiple collations, the server SHOULD select the
	When used as a protocol element for ordering, the collation name MAY	collation with the broadest scope (preferably international scope),
	be prefixed by either "+" or "-" to explicitly specify an ordering	the most recent table versions and the greatest number of supported
	direction. As mentioned previously, "+" has no effect on the	operations.
	ordering function, while "-" negates the result of the ordering
	function. In general, collation-order is used when a client requests
	a collation, and collation-sel is used with the server informs the
	client of the selected collation.

	collation-wild = ("" / (ALPHA [""])) (collation-char [""])	collation-wild = ("" / (ALPHA [""])) (collation-char [""])

	; MUST NOT exceed 255 characters total	; MUST NOT exceed 254 characters total


	collation-sel = ["+" / "-"] collation-name	3.3. Ordering Direction

		When used as a protocol element for ordering, the collation
		identifier MAY be prefixed by either "+" or "-" to explicitly specify
		an ordering direction. "+" has no effect on the ordering operation,
		while "-" inverts the result of the ordering operation. In general,
		collation-order is used when a client requests a collation, and
		collation-selected is used when the server informs the client of the
		selected collation.

		collation-selected = ["+" / "-"] collation-id

	collation-order = ["+" / "-"] collation-wild	collation-order = ["+" / "-"] collation-wild


	Some protocols are designed to use URIs to refer to collations rather	3.4. URIs
	than simple tokens. A special section of the IANA web page is
		Some protocols are designed to use URIs [4] to refer to collations
		rather than simple tokens. A special section of the IANA web page is
	reserved for such usage. The "collation-uri" form is used to refer	reserved for such usage. The "collation-uri" form is used to refer
	to a specific IANA registry entry for a specific named collation (the	to a specific IANA registry entry for a specific named collation (the
	collation registration may not actually be present if it is	collation registration may not actually be present if it is
	experimental). The "collation-auri" form is an abstract name for an	experimental). The "collation-auri" form is an abstract name for an

	ordering, a comparator pattern or a vendor private comparator.	ordering, a collation pattern or a vendor private collator.

	collation-uri = "http://www.iana.org/assignments/collation/"	collation-uri = "http://www.iana.org/assignments/collation/"

	collation-name ".xml"	collation-id ".xml"

	collation-auri = ( "http://www.iana.org/assignments/collation/"	collation-auri = ( "http://www.iana.org/assignments/collation/"

	collation-order [".xml"]) / other-uri	collation-order ".xml" ) / other-uri


	other-uri = absoluteURI	other-uri = <absoluteURI>
	; excluding the IANA collation namespace.	; excluding the IANA collation namespace.


		3.5. Naming Guidelines

	While this specification makes no absolute requirements on the	While this specification makes no absolute requirements on the

	structure of collation names, naming consistency is important, so the	structure of collation identifiers, naming consistency is important,
	following initial guidelines are provided.	so the following initial guidelines are provided.


	Collation names with an international audience typically begin with	Collation identifiers with an international audience typically begin
	"i;". Collation names intended for a particular language or locale	with "i;". Collation identifiers intended for a particular language
	typically begin with a language tag [5] followed by a ";". After the	or locale typically begin with a language tag [5] followed by a ";".
	first ";" is normally the name of the general collation algorithm	After the first ";" is normally the name of the general collation
	followed by a series of algorithm modifications separated by the ";"	algorithm, followed by a series of algorithm modifications separated
	delimiter. Parameterized modifications will use "=" to delimit the	by the ";" delimiter. Parameterized modifications will use "=" to
	parameter from the value. The version numbers of any lookup tables	delimit the parameter from the value. The version numbers of any
	used by the algorithm SHOULD be present as parameterized	lookup tables used by the algorithm SHOULD be present as
	modifications.	parameterized modifications.


	Collation names of the form ;vnd-domain.com; are reserved for	Collation identifiers of the form ;vnd-domain.com; are reserved for
	vendor-specific collations created by the owner of the domain name	vendor-specific collations created by the owner of the domain name

	following the "vnd-" prefix. Registration of such collations (or the	following the "vnd-" prefix (e.g. vnd-example.com for the vendor
	name space as a whole) with intended use of "Vendor" is encouraged	example.com). Registration of such collations (or the name space as
	when a public specification or open-source implementation is	a whole) with intended use of "Vendor" is encouraged when a public
	available, but is not required.	specification or open-source implementation is available, but is not
		required.

		4. Collation Specification Requirements

		4.1. Collation/Server Interface

		The collation itself defines what it operates on. Most collations
		are expected to operate on character strings. The i;octet
		(Section 9.4) collation operates on octet strings. The i;ascii-
		numeric (Section 9.1) operation operates on numbers.

		This specification defines the collation interface in terms of octet
		strings. However, implementations may choose to use character
		strings instead. Such implementations may not be able to implement
		e.g. i;octet. Since i;octet is not currently mandatory to implement
		for any protocol, this should not be a problem.


	4. Collation Specification Requirements	4.2. Operations Supported

	A collation specification MUST state which of the three basic	A collation specification MUST state which of the three basic

	functions are supported (equality, substring, ordering) and how to	operations are supported (equality, substring, ordering) and how to
	perform each of the supported functions on any two input	perform each of the supported operations on any two input character
	octet-strings including empty strings. Given a collation with a	strings including empty strings. Collations must be deterministic,
	specific name, and any two fixed input strings, the result MUST be	i.e. given a collation with a specific identifier, and any two fixed
	the same. The collation specification MUST state whether the	input strings, the result MUST be the same for the same operation.
	collation operates on raw octets or on characters (in which case the
	UTF-8 charset is presumed). Collations MUST be transitive.	In general, collation operations should behave as their names
		suggest. While a collation may be new, the operations are not, so
	A collation specification MUST describe the internal canonicalization	the new collation's operations should be similar to those of older
	algorithm. This algorithm can be applied to individual strings and	collations. For example, a date/time collation should not provide a
	the result strings can be stored to potentially optimize future	"substring" operation that would morph IMAP substring SEARCH into
	comparison operations. A collation MAY specify that the	e.g. a date-range search.
	canonicalization algorithm is the identity function. The output of
	the canonicalization algorithm MAY have no meaning to a human.	A nonobvious consequence of the rules for each collation operation is
		that for any single collation, either none or all of the operations
	Collations which use more than one customizable lookup table in a	can return "undefined". For example, it is not possible to have an
	documented format MUST assign numbers to the tables they use. This	equality operation that never returns "undefined" and a substring
	permits an application protocol command to access the tables used by	operation that occasionally does.
	a server collation.
		4.2.1. Validity
	o The equality function always returns "match" or "no-match" when
	supplied valid input and MAY return "error" if the input strings	The validity test takes one string as argument returns valid if its
	are not valid UTF-8 strings or violate other collation	input string is valid input to collation's other operations, and
	constraints.	invalid if not. (In other words, a string is valid if it is equal to
		itself according to the collation's equality operation.)
	o The substring matching function determines if the first string is
	a substring of the second string. A collation which supports	The validity test is provided by all collations. It MUST NOT be
	substring matching will automatically support the two special	listed separately in the collation registration.
	cases of substring matching: prefix and suffix matching if those
	special cases are supported by the application protocol. It	4.2.2. Equality
	returns "match" or "no-match" when supplied valid input and
	returns "error" when supplied invalid input.	The equality test always returns "match" or "no-match" when supplied
		valid input, and MAY return "undefined" if one or both input strings
	o The ordering function determines how two octet strings are	are not valid.
	ordered. It returns "-1" if the first string is listed before the
	second string according to the collation, "+1" if the second	The equality test MUST be reflexive and symmetric. For valid input,
	string is listed before the first string, and "0" if the two	it MUST be transitive.
	strings are equal. If the order of the two strings is reversed,
	the result of the ordering function of the collation MUST be	If a collation provides either a substring or an ordering test, it
	negated. In general, collations SHOULD NOT return "0" unless the	MUST also provide an equality test. The substring and/or ordering
	two octet sequences are identical.	tests MUST be consistent with the equality test.

	Since ordering is normally used to sort a list of items, "error"	In this specification, the return values of the equality test are
	is not a useful return value from the ordering function. Strings	called "match", "no-match" and "undefined". This is not a
	with errors that prevent the sorting algorithm from functioning	specification, merely a choice of phrasing.
	correctly should sort to the end of the list. Thus if the first
	string is invalid UTF-8 while the second string is valid, the	4.2.3. Substring
	result will be "+1". If the second string is invalid UTF-8 while
	the first string is valid, the result will be "-1". If the	The substring matching operation determines if the first string is a
	collation is character-based, and both strings are invalid UTF-8,	substring of the second string, ie. if one or more substrings of the
	the result SHOULD match the result from the "i;octet" collation.	second string is equal to the first, as defined by the collation's
		equality operation.
	When the collation is used with a "+" prefix, the behavior is the
	same as when used with no prefix. When the collation is used with	A collation which supports substring matching will automatically
	a "-" prefix, results which would be "+1" are instead "-1" and	support two special cases of substring matching: prefix and suffix
	results which would be "-1" are instead "+1".	matching if those special cases are supported by the application
		protocol. It returns "match" or "no-match" when supplied valid input
	Unless otherwise specified by the collation or application protocol,	and returns "undefined" when supplied invalid input.
	a NULL string (as opposed to an empty string) is equal only to
	another NULL string, a NULL string is not a substring of any other
	string, and a NULL string sorts to a position after all non-NULL
	strings, but before strings which generate errors.

	Some application protocols will permit the use of multi-value
	attributes with a collation. This paragraph describes the rules that
	apply unless otherwise specified by the collation or application
	protocol. The equality and substring collation algorithms will be
	iterated over each pair of single values from the two inputs. If any
	combination produces an error, the result is an error. Otherwise, if
	any combination produces a "match", the result is a match. Otherwise
	the result is "no-match". For the ordering function, the smallest
	ordinal octet string from the first set of values is compared to the
	smallest ordinal octet string from the second set of values.

	Application protocols MAY return position information for substring	Application protocols MAY return position information for substring

	matches. If this is done, the position information MUST include both	matches. If this is done, the position information SHOULD include
	the starting offset and the ending offset in the string. This is	both the starting offset and the ending offset for each match. This
	important because more sophisticated collations can match strings of	is important because more sophisticated collations can match strings
	unequal length (for example, a pre-composed accented character will	of unequal length (for example, a pre-composed accented character can
	match a decomposed accented character).	match a decomposed accented character). In general, overlapping
		matches SHOULD be reported (as when "ana" occurs twice within
	Collation specifications intended for common use are expected to	"banana") although there are cases where a collation may decide not
	reference standards from standards bodies with significant experience	to. For example, in a collation which treats all whitespace
	dealing with the details of international character sets.	sequences as identical, the substring operation could be defined such
		that " 1 " (SP "1" SP) is reported just once within " 1 " (SP SP "1"
	5. Application Protocol Requirements	SP SP), not four times (SP SP 1 SP, SP 1 SP, SP 1 SP SP and SP SP 1
		SP SP).
	An application protocol which offers searching, substring matching
	and/or sorting and permits the use of characters outside the US-ASCII	A string is a substring of itself. The empty string is a substring
	charset needs to consider the following requirements and issues:	of all strings.

		Note that the substring operation of some collations can match
		strings of unequal length. For example, a pre-composed accented
		character can match a decomposed accented character. Unicode
		Collation Algorithm [8] discusses this in more detail.

		In this specification, the return values of the substring operation
		are called "match", "no-match" and "undefined". This is not a
		specification, merely a choice of phrasing.

		4.2.4. Ordering

		The ordering operation determines how two strings are ordered. It
		MUST be trichotomous and reflexive. For valid input, it MUST be
		transitive.

		Ordering returns "less" if the first string is listed before the
		second string according to the collation, "greater" if the second
		string is listed before the first string, and "equal" if the two
		strings are equal as defined by the collation's equality operation.
		If one or both strings are invalid, the result of ordering is
		"undefined".

		When the collation is used with a "+" prefix, the behavior is the
		same as when used with no prefix. When the collation is used with a
		"-" prefix, the result of the ordering operation of the collation
		MUST be reversed.

		In this specification, the return values of the ordering operation
		are called "less", "equal", "greater" and "undefined". This is not a
		specification, merely a choice of phrasing.

		4.3. Sort Keys

		A collation specification SHOULD describe the internal transformation
		algorithm to generate sort keys. This algorithm can be applied to
		individual strings and the result can be stored to potentially
		optimize future comparison operations. A collation MAY specify that
		the sort key is generated by the identity function. The sort key may
		have no meaning to a human. The sort key may not be valid input to
		the collation.

		4.4. Use of Lookup Tables

		Some collations use customizable lookup tables, e.g. because the
		tables depend on locale and may be modified after shipping the
		software. Collations which use more than one customizable lookup
		table in a documented format MUST assign numbers to the tables they
		use. This permits an application protocol command to access the
		tables used by a server collation, so that clients and servers use
		the same tables.

		5. Application Protocol Requirements

		This section describes the requirements and issues that an
		application protocol needs to consider if it offers searching,
		substring matching and/or sorting, and permits the use of characters
		outside the US-ASCII charset.

		5.1. Character Encoding

		The protocol specification has to make sure that it is clear on which
		characters (rather than just octets) the collations are used. This
		can be done by specifying the protocol itself in terms of characters
		(e.g. in the case of a query language), by specifying a single
		character encoding for the protocol (e.g. UTF-8 [3]), or by
		carefully describing the relevant issues of character encoding
		labeling and conversion. In the later case, details to consider
		include how to handle unknown charsets, any charsets which are
		mandatory-to-implement, any issues with byte-order that might apply,
		and any transfer encodings which need to be supported.

		5.2. Operations

		The protocol must specify which of the operations defined in this
		specification (equality matching, substring matching and ordering)
		can be invoked in the protocol, and how they are invoked. There may
		be more than one way to invoke an operation.

	The protocol MUST provide a mechanism for the client to select the	The protocol MUST provide a mechanism for the client to select the
	collation to use with equality matching, substring matching and	collation to use with equality matching, substring matching and
	ordering.	ordering.


	The protocol MUST specify how comparisons behave in the absence of an	If a protocol needs a total ordering and the collation chosen does
	explicit collation negotiation or when a collation negotiation of "*"	not provide it because the ordering operation returns "undefined" at
	is used. The protocol MAY specify that the default collation used in	least once, the recommended fallback is to sort all invalid strings
	such circumstances is sensitive to server configuration.	after the valid ones, and use i;octet to order the invalid strings.

	The protocol SHOULD provide a way to list available collations	Although the collation's substring function provides a list of
	matching a given wildcard pattern or patterns.	matches, a protocol need not provide all that to the client. It may
		provide only the first matching substring, or even just the
		information that the substring search matched.

	If the protocol provides positional information for the results of a	If the protocol provides positional information for the results of a

	substring match, that positional information MUST fully specify the	substring match, that positional information SHOULD fully specify the
	substring in the result that matches independent of the length of the	substring(s) in the result that matches independent of the length of
	search string. For example, returning both the starting and ending	the search string. For example, returning both the starting and
	offset of the match would suffice, as would the starting offset and a	ending offset of the match would suffice, as would the starting
	length. Returning just the starting offset is not acceptable. This	offset and a length. Returning just the starting offset is not
	rule is necessary because advanced collations can treat strings of	acceptable. This rule is necessary because advanced collations can
	different lengths as equal (for example, pre-composed and decomposed	treat strings of different lengths as equal (for example, pre-
	accented characters).	composed and decomposed accented characters).

	If the protocol permits the use of collations on stored character	5.3. Wildcards
	data which is not encoded with the UTF-8 charset, then the protocol
	specification has to describe relevant issues of the conversion.	The protocol MUST specify whether it allows the use of wildcards in
	Details to consider include how to handle unknown charsets, any	collation identifiers or not. If the protocol allows wildcards,
	charsets which are mandatory-to-implement, any issues with byte-order	then:
	that might apply, and any transfer encodings which need to be	The protocol MUST specify how comparisons behave in the absence of
	supported.	explicit collation negotiation or when a collation of "*" is
		requested. The protocol MAY specify that the default collation
		used in such circumstances is sensitive to server configuration.
		The protocol SHOULD provide a way to list available collations
		matching a given wildcard pattern or patterns.

		5.4. Canonicalization Function

		If the protocol uses a canonicalization function for strings, then
		use of collations MAY be appropriate for that function. As an
		example, many protocols use case independent strings. In most cases,
		a simple ASCII mapping to upper/lower case works well, as i;ascii-
		casemap offers. However, in some cases another collation may be
		better, e.g. to handle Turkish dotted/dotless i. Protocol designers
		should consider in each case whether to use a specifiable collation.


	If the protocol provides a canonicalization function for strings,	5.5. Disconnected Clients
	then use of collations MAY be appropriate for that function.

	If the protocol supports disconnected clients, then a mechanism for	If the protocol supports disconnected clients, then a mechanism for
	the client to precisely replicate the server's collation algorithm is	the client to precisely replicate the server's collation algorithm is
	likely desirable. Thus the protocol MAY wish to provide a command to	likely desirable. Thus the protocol MAY wish to provide a command to
	fetch lookup tables used by charset conversions and collations.	fetch lookup tables used by charset conversions and collations.


		5.6. Error Codes

	The protocol specification should consider assigning protocol error	The protocol specification should consider assigning protocol error
	codes for the following circumstances:	codes for the following circumstances:

		o The client requests the use of a collation by identifier or
	o The client requests the use of a collation by name or pattern, but	pattern, but no implemented collation matches that pattern.
	no implemented collation matches that pattern.	o The client attempts to use a collation for an operation that is
		not supported by that collation. For example, attempting to use
	o The client attempts to use a collation for a function that is not	the "i;ascii-numeric" collation for substring matching.
	supported by that collation. For example, attempting to use the
	"i;ascii-numeric" collation for a substring matching function.

	o The client uses an equality or substring matching collation and	o The client uses an equality or substring matching collation and
	the result is an error. It may be appropriate to distinguish	the result is an error. It may be appropriate to distinguish
	between the two input strings, particularly when one is supplied	between the two input strings, particularly when one is supplied
	by the client and one is stored by the server. It might also be	by the client and one is stored by the server. It might also be
	appropriate to distinguish the specific case of an invalid UTF-8	appropriate to distinguish the specific case of an invalid UTF-8
	string.	string.


	If the protocol permits the use of a collation with data structures	5.7. Octet Collation
	beyond those described in this specification (octet strings, NULL
	string, array of octet strings), the protocol MUST describe the
	default behavior for a collation with that data structure.


	6. Initial Collations	The i;octet (Section 9.4) collation is only usable with protocols
		based on octet-strings. Clients and servers MUST NOT use i;octet
		with other protocols.


	This section describes an initial set of collations for the collation	If the protocol permits the use of collations with data structures
	registry.	other than strings, the protocol MUST describe the default behavior
		for a collation with those data structures.
	6.1 Octet Collation

	The "i;octet" collation is a simple and fast collation intended for
	use on binary octet strings rather than on character data. It never
	returns an "error" result. It provides equality, substring and
	ordering functions. The ordering algorithm is as follows:

	1. If both strings are the empty string, return the result "0".

	2. If the first string is empty and the second is not, return the
	result "-1".


	3. If the second string is empty and the first is not, return the	6. Use by Existing Protocols
	result "+1".

	4. If both strings begin with the same octet value, remove the first
	octet from both strings and repeat this algorithm from step 1.


	5. If the unsigned value (0 to 255) of the first octet of the first	Both ACAP [12] and Sieve [15] are standards track specifications
	string is less than the unsigned value of the first octet of the	which used collations prior to the creation of this specification and
	second string, then return "-1".	registry. Those standards do not meet all the application protocol
		requirements described in Section 5.


	6. If this step is reached, return "+1".	These protocols allow the use of the i;octet (Section 9.4) collation
		working directly on UTF-8 data as used in these protocols.


	This algorithm is roughly equivalent to the C library function memcmp	In Sieve, all matches are either true and false. Accordingly, Sieve
	with appropriate length checks added.	servers must treat "undefined" and "no-match" results of the equality
		and substring operations as false, and only "match" as true.


	The matching function returns "match" if the sorting algorithm would	In ACAP and Sieve, there are no invalid strings. In this document's
	return "0". Otherwise the matching function returns "no-match".	terms, invalid strings sort after valid strings.


	The substring function returns "match" if the first string is the	IMAP [16] also collates, although that is explicit only when the
	empty string, or if there exists a substring of the second string of	COMPARATOR [18] extension is used. The built-in IMAP substring
	length equal to the length of the first string which would result in	operation and the ordering provided by the SORT [17] extension may
	a "match" result from the equality function. Otherwise the substring	not meet the requirements made in this document.
	function returns "no-match".


	The associated canonicalization algorithm is the identity function.	Other protocols may be in a similar position.


	6.2 ASCII Numeric Collation	In IMAP, the default collation is i;ascii-casemap, because its
		operations most closely resembles IMAP's built-in operations.


	The "i;ascii-numeric" collation is a simple collation intended for	7. Collation Registration
	use with arbitrary sized decimal numbers stored as octet strings of
	US-ASCII digits (0x30 to 0x39). It supports equality and ordering,
	but does not support the substring function. The algorithm is as
	follows:

	1. If neither string begins with a digit, return "error" if
	matching, or the result of the "i;octet" collation for ordering.

	2. If the first string begins with a digit and the second string
	does not, return "error" if matching and "-1" for ordering.

	3. If the second string begins with a digit and the first string
	does not, return "error" if matching and "+1" for ordering.

	4. Let "n" be the number of digits at the beginning of the first
	string, and "m" be the number of digits at the beginning of the
	second string.

	5. If n is equal to m, return the result of the "i;octet" collation.

	6. If n is greater than m, prepend a string of "n - m" zeros to the
	second string and return the result of the "i;octet" collation.

	7. If m is greater than n, prepend a string of "m - n" zeros to the
	first string and return the result of the "i;octet" collation.

	The associated canonicalization algorithm is to truncate the input
	string at the first non-digit character.

	6.3 ASCII Casemap Collation
	The "en;ascii-casemap" collation is a simple collation intended for
	use with English language text in pure US-ASCII. It provides
	equality, substring and ordering functions. The algorithm first
	applies a canonicalization algorithm to both input strings which
	subtracts 32 (0x20) from all octet values between 97 (0x61) and 122
	(0x7A) inclusive. The result of the collation is then the same as
	the result of the "i;octet" collation for the canonicalized strings.
	Care should be taken when using OS-supplied functions to implement
	this collation as this is not locale sensitive, but functions such as
	strcasecmp and toupper can be locale sensitive.


	For historical reasons, in the context of ACAP and Sieve, the name	7.1. Collation Registration Procedure
	"i;ascii-casemap" is a synonym for this collation.


	6.4 Nameprep Collation	The IETF will create a mailing list, collation@ietf.org, which can be
		used for public discussion of collation proposals prior to
		registration. Use of the mailing list is strongly encouraged. The
		IESG will appoint a designated expert who will monitor the
		collation@ietf.org mailing list and review registrations.


	The "i;nameprep;v=1;uv=3.2" collation is an implementation of the	The registration procedure begins when a completed registration
	nameprep [7] specification based on normalization tables from Unicode	template is sent to iana@iana.org and collation@ietf.org. The
	version 3.2. This collation applies the nameprep canoncialization
	function to both input strings and then returns the result of the
	i;octet collation on the canonicalized strings. While this collation
	offers all three functions, the ordering function it provides is
	inadequate for use by the majority of the world.

	Version number 1 is applied to nameprep as specified in RFC 3491. If
	the nameprep specification is revised without any changes that would
	produce different results when given the same pair of input octet
	strings, then the version number will remain unchanged.

	The table numbers for tables used by nameprep are as follows:

	+--------------+-----------------------+
	\| Table Number \| Table Name \|
	+--------------+-----------------------+
	\| 1 \| UnicodeData-3.2.0.txt \|
	\| 2 \| Table B.1 \|
	\| 3 \| Table B.2 \|
	\| 4 \| Table C.1.2 \|
	\| 5 \| Table C.2.2 \|
	\| 6 \| Table C.3 \|
	\| 7 \| Table C.4 \|
	\| 8 \| Table C.5 \|
	\| 9 \| Table C.6 \|
	\| 10 \| Table C.7 \|
	\| 11 \| Table C.8 \|
	\| 12 \| Table C.9 \|
	+--------------+-----------------------+

	6.5 Basic Collation

	The basic collation is intended to provide tolerable results for a
	number of languages for all three functions (equality, substring and
	ordering) so it is suitable as a mandatory-to-implement collation for
	protocols which include ordering support. The ordering function of
	the basic collation is the Unicode Collation Algorithm [8] version 9
	(UCAv9).

	The equality and substring functions are created as described in
	UCAv9 section 8. While that section is informative to UCAv9, it is
	normative to this collation specification.

	This collation is based on Unicode version 3.2, with the following
	tables relevant:

	1. For the normalization step, UnicodeData-3.2.0.txt [16] is used.
	Column 5 is used to determine the canonical decomposition, while
	column 3 contains the canonical combining classes necessary to
	attain canonical order.

	2. The table of characters which require a logical order exception
	is a subset of the table in PropList-3.2.0.txt [17] and is
	included here:

	0E40..0E44 ; Logical_Order_Exception
	# Lo [5] THAI CHARACTER SARA E..THAI CHARACTER SARA AI MAIMALAI
	0EC0..0EC4 ; Logical_Order_Exception
	# Lo [5] LAO VOWEL SIGN E..LAO VOWEL SIGN AI

	# Total code points: 10

	3. The table used to translate normalized code points to a sort key
	is allkeys-3.1.1.txt [18].

	UCAv9 includes a number of configurable parameters and steps labelled
	as potentially optional. The following list summarizes the defaults
	used by this collation:

	o The logical order exception step is mandatory by default to
	support the largest number of languages.

	o Steps 2.1.1 to 2.1.3 are mandatory as the repertoire of the basic
	collation is intended to be large.

	o The second level in the sort key is evaluated forwards by default.

	o The variable weighting uses the "non-ignorable" option by default.

	o The semi-stable option is not used by default.

	o Support for exactly three levels of collation is the default
	behavior.

	o No preprocessing step is used by the basic collation prior to
	applying the UCAv9 algorithm. Note that an application protocol
	specification MAY require pre-processing prior to the use of any
	collations.

	o The equality and substring algorithms exclude differences at level
	2 and 3 by default (thus it is case-insensitive and ignores
	accentual distinctions.

	o The equality and substring algorithms use the "Whole Characters
	Only" feature described in UCAv9 section 8 by default.

	The exact collation name with these defaults is
	"i;basic;uca=3.1.1;uv=3.2". When a specification states that the
	basic collation is mandatory-to-implement, only this specific name is
	mandatory-to-implement.

	In order to allow modification of the optional behaviors, the
	following ABNF is used for variations of the basic collation:

	basic-collation = ("i" / Language-Tag) ";basic;uca=3.1.1;uv=3.2"
	[";match=accent" / ";match=case"]
	[";tailor=" 1*collation-char ]

	If multiple modifiers appear, they MUST appear in the order described
	above. The modifiers have the following meanings:

	match=accent Both the first and second levels of the sort keys are
	considered relevant to the equality and substring
	operations (rather than the default of first level
	only). This makes the matching functions sensitive to
	accentual distinctions.

	match=case The first three levels of sort keys are considered
	relevant to the equality and substring operations.
	This makes the matching functions sensitive to both
	case and accentual distinctions.

	The default weighting option is "non-ignorable". The "semi-stable"
	sort key option is not used by default.

	The canonicalization algorithm associated with this collation is the
	output of step 3 of the UCAv9 algorithm (described in section 4.3 of
	the UCA specification). This canonicalization is not suitable for
	human consumption.

	Finally, the UCAv9 algorithm permits the "allkeys" table to be
	tailored to a language. People who make quality tailorings are
	encouraged to register those tailorings using the collation registry.
	Tailoring names beginning with "x" are reserved for experimental use,
	are treated as "Limited use" and MUST NOT match wildcards if any
	registered collation is available that does match.

	7. Use by ACAP and Sieve

	Both ACAP [11] and Sieve [15] are standards track specifications
	which used collations prior to the creation of this specification and
	registry. Those standards do not meet all the application protocol
	requirements described in Section 5. For backwards compatibility,
	those protocols use the "i;ascii-casemap" instead of
	"en;ascii-casemap".

	8. IANA Considerations

	8.1 Collation Registration Procedure

	IANA will create a mailing list collation@iana.org which can be used
	for public discussion of collation proposals prior to registration.
	Use of the mailing list is encouraged but not required. The actual
	registration procedure will not begin until the completed
	registration template is sent to iana@iana.org. The IESG will
	appoint a designated expert who will monitor the collation@iana.org
	mailing list and review registrations forwarded from IANA. The
	designated expert is expected to tell IANA and the submitter of the	designated expert is expected to tell IANA and the submitter of the
	registration within two weeks whether the registration is approved,	registration within two weeks whether the registration is approved,
	approved with minor changes, or rejected with cause. When a	approved with minor changes, or rejected with cause. When a
	registration is rejected with cause, it can be re-submitted if the	registration is rejected with cause, it can be re-submitted if the
	concerns listed in the cause are addressed. Decisions made by the	concerns listed in the cause are addressed. Decisions made by the

	designated expert can be appealed to the IESG and subsequently follow	designated expert can be appealed to IESG Applications Area Director,
	the normal appeals procedure for IESG decisions.	then to the IESG. They follow the normal appeals procedure for IESG
		decisions.

	Collation registrations in a standards track, BCP or IESG-approved	Collation registrations in a standards track, BCP or IESG-approved

	experimental RFC are owned by the IESG and changes to the	experimental RFC are owned by the IETF, and changes to the
	registration follow normal procedures for updating such documents.	registration follow normal procedures for updating such documents.
	Collation registrations in other RFCs are owned by the RFC author(s).	Collation registrations in other RFCs are owned by the RFC author(s).
	Other collation registrations are owned by the individual(s) listed	Other collation registrations are owned by the individual(s) listed
	in the contact field of the registration and IANA will preserve this	in the contact field of the registration and IANA will preserve this
	information. Changes to a registration MUST be approved by the	information. Changes to a registration MUST be approved by the

	owner. In the event the owner can't be contacted for a period of one	owner. In the event the owner cannot be contacted for a period of
	month and a change is deemed necessary, the IESG MAY re-assign	one month and a change is deemed necessary, the IESG MAY re-assign
	ownership to an appropriate party.	ownership to an appropriate party.


	8.2 Collation Registration Template	7.2. Collation Registration Format

	Registration of a collation is done by sending a well-formed XML	Registration of a collation is done by sending a well-formed XML

	document that validates with collationreg.dtd (Section 9). The	document to collation@ietf.org and iana@iana.org.
	registration MUST include a collation element that MAY include an
	"rfc=" attribute if the specification is in an RFC and MUST include a
	scope attribute of "i18n", "local" or "other" and an intendedUse
	attribute of "common", "limited", "vendor", or "deprecated".


	The collation element contains the other elements in the	7.2.1. Registration Template
	registration. The mandatory name element gives the precise name of
	the comparator. The mandatory title element give the title of the
	comparator. The mandatory functions element lists which of the three
	functions the comparator provides. The mandatory specification
	element describes where to find the specification, and MAY have a URI
	attribute. The submittor element provides an RFC 2822 email address
	for the person who submitted the registration. It is optional if the
	owner element contains an email address. The mandatory owner element
	contains either the four letters "IETF" or an email address of the
	owner of the registration. The optional version element is included
	when the registration is likely to be revised or has been revised in
	such a way that the results change for certain input strings. The
	optional UnicodeVersion element indicates the version number of the
	UnicodeData file on which the collation is based. The optional
	UCAVersion element specifics the version of the Unicode Collation
	Algorithm on which the collation is based. The optional
	UCAMatchLevel element specifies the number of Unicode Collation
	Algorithm sort key levels used for the equality and substring
	operations.

	Here is a template for the registration:	Here is a template for the registration:


	<?xml verison='1.0'?>	<?xml version='1.0'?>
	<!DOCTYPE rfc SYSTEM 'collationreg.dtd'>	<!DOCTYPE collation SYSTEM 'collationreg.dtd'>
	<collation rfc="XXXX" scope="i18n" intendedUse="common">	<collation rfc="YYYY" scope="i18n" intendedUse="common">
	<name>collation name</name>	<identifier>collation identifier</identifier>
	<title>technical title for collation</title>	<title>technical title for collation</title>

	<functions>equality order substring</functions>	<operations>equality order substring</operations>
	<specification>specification reference</specification>	<specification>specification reference</specification>
	<owner>email address of owner or IETF</owner>	<owner>email address of owner or IETF</owner>

	<submittor>email address of submittor<submittor>	<submitter>email address of submitter</submitter>
	<version>1</version>	<version>1</version>

	<UnicodeVersion>3.2</UnicodeVersion>
	<UCAVersion>3.1.1</UCAVersion>
	</collation>	</collation>


		7.2.2. The collation Element

		The root of the registration document MUST be a <collation> element.
		The collation element contains the other elements in the
		registration, which are described in the following sub-subsections,
		in the order given here.

		The <collation> element MAY include an "rfc=" attribute if the
		specification is in an RFC. The "rfc=" attribute gives only the
		number of the RFC, without any prefix, such as "RFC", or suffix, such
		as ".txt".

		The <collation> element MUST include a "scope=" attribute, which MUST
		have one of the values "i18n", "local" or "other".

		The <collation> element MUST include an "intendedUse=" attribute,
		which must have one of the values "common", "limited", "vendor", or
		"deprecated". Collation specifications intended for "common" use are
		expected to reference standards from standards bodies with
		significant experience dealing with the details of international
		character sets.

	Be aware that future revisions of this specification may add	Be aware that future revisions of this specification may add

	additional function types, as well as additional XML attributes and	additional function types, as well as additional XML attributes,
	values. Any system which automatically parses these XML documents	values and elements. Any system which automatically parses these XML
	MUST take this into account to preserve future compatibility.	documents MUST take this into account to preserve future
		compatibility.


	8.3 Octet Collation Registration	7.2.3. The identifier Element


	<?xml verison='1.0'?>	The <identifier> element gives the precise identifier of the
	<!DOCTYPE rfc SYSTEM 'collationreg.dtd'>	collation, e.g. i;ascii-casemap. The <identifier> element is
	<collation rfc="XXXX" scope="i18n" intendedUse="common">	mandatory.
	<name>i;octet</name>
	<title>Octet</title>
	<functions>equality order substring</functions>
	<specification>RFC XXXX</specification>
	<owner>IETF</owner>
	<submittor>chris.newman@sun.com<submittor>
	</collation>


	8.4 ASCII Numeric Collation Registration	7.2.4. The title Element


	<?xml verison='1.0'?>	The <title> element gives the title of the collation. The <title>
	<!DOCTYPE rfc SYSTEM 'collationreg.dtd'>	element is mandatory.
	<collation rfc="XXXX" scope="other" intendedUse="limited">
	<name>i;ascii-numeric</name>
	<title>ASCII Numeric</title>
	<functions>equality order</functions>
	<specification>RFC XXXX</specification>
	<owner>IETF</owner>
	<submittor>chris.newman@sun.com<submittor>
	</collation>


	8.5 Legacy English Casemap Collation Registration	7.2.5. The operations Element


	<?xml verison='1.0'?>	The <operations> element lists which of the three operations
	<!DOCTYPE rfc SYSTEM 'collationreg.dtd'>	("equality", "order" or "substring") the collation provides,
	<collation rfc="XXXX" scope="local" intendedUse="deprecated">	separated by single spaces. The <operations> element is mandatory.
	<name>i;ascii-casemap</name>
	<title>Legacy English Casemap</title>
	<functions>equality order substring</functions>
	<specification>RFC XXXX</specification>
	<owner>IETF</owner>
	<submittor>chris.newman@sun.com<submittor>
	</collation>


	8.6 English Casemap Collation Registration	7.2.6. The specification Element


	<?xml verison='1.0'?>	The <specification> element describes where to find the
	<!DOCTYPE rfc SYSTEM 'collationreg.dtd'>	specification. The <specification> element is mandatory. It MAY
	<collation rfc="XXXX" scope="local" intendedUse="common">	have a URI attribute. There may be more than one <specification>
	<name>en;ascii-casemap</name>	elements, in which case they together form the specification.
	<title>English Casemap</title>
	<functions>equality order substring</functions>
	<specification>RFC XXXX</specification>
	<owner>IETF</owner>
	<submittor>chris.newman@sun.com<submittor>
	</collation>


	8.7 Nameprep Collation Registration	If it is discovered that parts of a collation specification conflict,
		a new revision of the collation is necessary, and the
		collation@ietf.org mailing list should be notified.


	<?xml verison='1.0'?>	7.2.7. The submitter Element
	<!DOCTYPE rfc SYSTEM 'collationreg.dtd'>
	<collation rfc="XXXX" scope="i18n" intendedUse="common">
	<name>i;nameprep;v=1;uv=3.2</name>
	<title>Nameprep</title>
	<functions>equality order substring</functions>
	<specification>RFC XXXX</specification>
	<owner>IETF</owner>
	<submittor>chris.newman@sun.com<submittor>
	<version>1</version>
	<UnicodeVersion>3.2</UnicodeVersion>
	</collation>


	8.8 Basic Collation Registration	The <submitter> element provides an RFC 2822 [13] email address for
		the person who submitted the registration. It is optional if the
		<owner> element contains an email address.


	<?xml verison='1.0'?>	There may be more than one <submitter> element.
	<!DOCTYPE rfc SYSTEM 'collationreg.dtd'>
	<collation rfc="XXXX" scope="i18n" intendedUse="common">
	<name>i;basic;uca=3.1.1;uv=3.2</name>
	<title>Basic</title>
	<functions>equality order substring</functions>
	<specification>RFC XXXX</specification>
	<owner>IETF</owner>
	<submittor>chris.newman@sun.com<submittor>
	<UnicodeVersion>3.2</UnicodeVersion>
	<UCAVersion>3.1.1</UCAVersion>
	<UCAMatchLevel>1</UCAMatchLevel>
	</collation>


	8.9 Basic Accent Sensitive Match Collation Registration	7.2.8. The owner Element


	<?xml verison='1.0'?>	The <owner> element contains either the four letters "IETF" or an
	<!DOCTYPE rfc SYSTEM 'collationreg.dtd'>	email address of the owner of the registration. The <owner> element
	<collation rfc="XXXX" scope="i18n" intendedUse="common">	is mandatory. There may be more than one <owner> element. If so,
	<name>i;basic;uca=3.1.1;uv=3.2;match=accent</name>	all owners are equal. Each owner can speak for all.
	<title>Basic Accent Sensitive Match</title>
	<functions>equality order substring</functions>
	<specification>RFC XXXX</specification>
	<owner>IETF</owner>
	<submittor>chris.newman@sun.com<submittor>
	<UnicodeVersion>3.2</UnicodeVersion>
	<UCAVersion>3.1.1</UCAVersion>
	<UCAMatchLevel>2</UCAMatchLevel>
	</collation>


	8.10 Basic Case Sensitive Match Collation Registration	7.2.9. The version Element


	<?xml verison='1.0'?>	The <version> element is included when the registration is likely to
	<!DOCTYPE rfc SYSTEM 'collationreg.dtd'>	be revised or has been revised in such a way that the results change
	<collation rfc="XXXX" scope="i18n" intendedUse="common">	for certain input strings. The <version> element is optional.
	<name>i;basic;uca=3.1.1;uv=3.2;match=case</name>
	<title>Basic Case Sensitive Match</title>	7.2.10. The variable Element
	<functions>equality order substring</functions>
	<specification>RFC XXXX</specification>	The <variable> element specifies an optional variable using which the
	<owner>IETF</owner>	collation's behaviour can be tailored. The <variable> element is
	<submittor>chris.newman@sun.com<submittor>	optional. When it is used, it must contain <name> and <default>
	<UnicodeVersion>3.2</UnicodeVersion>	elements and may contain one or more <value> elements.
	<UCAVersion>3.1.1</UCAVersion>
	<UCAMatchLevel>3</UCAMatchLevel>
	</collation>


	8.11 Structure of Collation Registry	7.2.11. The name Element

		The <name> element specifies the name value of a variable. The
		<name> element is mandatory.

		7.2.12. The default Element

		The <default> element specifies the default value of a variable. The
		<default> element is mandatory.

		7.2.13. The value Element

		The <value> element specifies a legal value of a variable. The
		<value> element is optional. If one or more <value> elements are
		present, only those values are legal. If none is, then the
		variable's legal values do not form an enumerated set, and the rules
		MUST be specified in an RFC accompanying the registration.

		7.3. Structure of Collation Registry

	Once the registration is approved, IANA will store each XML	Once the registration is approved, IANA will store each XML

	registration document in a URL of the form http://www.iana.org/	registration document in a URL of the form
	assignments/collation/collation-name.xml where collation-name is the	http://www.iana.org/assignments/collation/collation-id.xml where
	contents of the name element in the registration. Both the submittor	collation-id is the contents of the identifier element in the
	and the designated expert is responsible for verifying that the XML	registration. Both the submitter and the designated expert are
	is well-formed and complies with the DTD. In the future, it is hoped	responsible for verifying that the XML is well-formed. The
	IANA will take over XML verification responsibility from the	registration document should avoid using new elements. If any are
	designated expert.	necessary, it is important to be consistent with other registrations.

	IANA will also maintain a text summary of the registry under the name	IANA will also maintain a text summary of the registry under the name
	http://www.iana.org/assignments/collation/summary.txt. This summary	http://www.iana.org/assignments/collation/summary.txt. This summary
	is divided into four sections. The first section is for collations	is divided into four sections. The first section is for collations
	intended for common use. This section is intended for collation	intended for common use. This section is intended for collation
	registrations published in IESG approved RFCs or for locally scoped	registrations published in IESG approved RFCs or for locally scoped
	collations from the primary standards body for that locale. The	collations from the primary standards body for that locale. The
	designated expert is encouraged to reject collation registrations	designated expert is encouraged to reject collation registrations
	with an intended use of "common" if the expert believes it should be	with an intended use of "common" if the expert believes it should be
	"limited", as it is desirable to keep the number of "common"	"limited", as it is desirable to keep the number of "common"
	registrations small and high quality. The second section is reserved	registrations small and high quality. The second section is reserved
	for limited use collations. The third section is reserved for	for limited use collations. The third section is reserved for
	registered vendor specific collations. The final section is reserved	registered vendor specific collations. The final section is reserved
	for deprecated collations.	for deprecated collations.


	8.12 Example Initial Registry Summary	7.4. Example Initial Registry Summary

	The following is an example of how IANA might structure the initial	The following is an example of how IANA might structure the initial
	registry summary.txt file:	registry summary.txt file:

	Collation Functions Scope Reference	Collation Functions Scope Reference
	--------- --------- ----- ---------	--------- --------- ----- ---------
	Common Use Collations:	Common Use Collations:

	i;octet e, o, s Other [RFC XXXX]
	i;nameprep;v=1;uv=3.2 e, o, s i18n [RFC XXXX]	i;nameprep;v=1;uv=3.2 e, o, s i18n [RFC XXXX]

	i;basic;uca=3.1.1;uv=3.2 e, o, s i18n [RFC XXXX]	i;ascii-casemap e, o, s Local [RFC XXXX]
	i;basic;uca=3.1.1;uv=3.2;match=accent e, o, s i18n [RFC XXXX]
	i;basic;uca=3.1.1;uv=3.2;match=case e, o, s i18n [RFC XXXX]
	en;ascii-casemap e, o, s Local [RFC XXXX]

	Limited Use Collations:	Limited Use Collations:

		i;octet e, o, s Other [RFC XXXX]
	i;ascii-numeric e, o Other [RFC XXXX]	i;ascii-numeric e, o Other [RFC XXXX]

	Vendor Collations:	Vendor Collations:

	Deprecated Collations:	Deprecated Collations:

	i;ascii-casemap e, o, s Local [RFC XXXX]

	References	References
	----------	----------

	[RFC XXXX] Newman, C., "Internet Application Protocol Collation	[RFC XXXX] Newman, C., Duerst, M., Gulbrandsen, A., "Internet
	Registry", RFC XXXX, Sun Microsystems, October 2003.	Application Protocol Collation Registry", RFC XXXX,
		Sun Microsystems, October 2013.
	9. DTD for Collation Registration


	<!-	8. Guidelines for Expert Reviewer
	DTD for Collation Registration Document

	Data types:

	entity description
	====== ===========
	NUMBER [0-9]+
	URI As defined in RFC 2396
	CTEXT printable ASCII text (no line-terminators)
	TEXT character data
	->
	<!ENTITY % NUMBER "CDATA">
	<!ENTITY % URI "CDATA">
	<!ENTITY % CTEXT "#PCDATA">
	<!ENTITY % TEXT "#PCDATA">
	<!ELEMENT collation (name,title,functions,specification+,owner+,
	submittor*,version?,UnicodeVersion?,
	UCAVersion?,UCAMatchLevel?)>
	<!ATTLIST collation
	rfc %NUMBER; "0"
	scope (i18n\|local\|other) #IMPLIED
	intendedUse (common\|limited\|vendor\|deprecated) #IMPLIED>
	<!ELEMENT name (%CTEXT;)>
	<!ELEMENT title (%CTEXT;)>
	<!ELEMENT functions (%CTEXT;)>
	<!ELEMENT specification (%TEXT;)>
	<!ATTLIST specification
	uri %URI; "">
	<!ELEMENT owner (%CTEXT;)>
	<!ELEMENT submittor (%CTEXT;)>
	<!ELEMENT version (%CTEXT;)>
	<!ELEMENT UnicodeVersion (%CTEXT;)>
	<!ELEMENT UCAVersion (%CTEXT;)>
	<!ELEMENT UCAMatchLevel (%CTEXT;)>

	10. Guidelines for Expert Reviewer

	The expert reviewer appointed by the IESG has fairly broad latitude	The expert reviewer appointed by the IESG has fairly broad latitude
	for this registry. While a number of collations are expected	for this registry. While a number of collations are expected
	(particularly customizations of the basic collation for localized	(particularly customizations of the basic collation for localized
	use), an explosion of collations (particularly common use collations)	use), an explosion of collations (particularly common use collations)
	is not desirable for widespread interoperability. However, it is	is not desirable for widespread interoperability. However, it is
	important for the expert reviewer to provide cause when rejecting a	important for the expert reviewer to provide cause when rejecting a
	registration, and when possible to describe corrective action to	registration, and when possible to describe corrective action to
	permit the registration to proceed. The following table includes	permit the registration to proceed. The following table includes
	some example reasons to reject a registration with cause:	some example reasons to reject a registration with cause:

		o The registration is not a well-formed XML document.
	o The registration is not a well-formed XML document that follows	o The registration has an intended use of "common", but there is no
	the DTD.	evidence the collation will be widely deployed, so it should be

	o The registration has intended use of "common", but there is no
	evidence the collation will be widely deployed so it should be
	listed as "limited".	listed as "limited".

		o The registration has an intended use of "common", but it is
		redundant with the functionality of a previously registered
		"common" collation.
		o The registration has an intended use of "common", but the
		specification is not detailed enough to allow interoperable
		implementations by others.


	o The registration has intended use of "common", but is redundant	o The collation identifier fails to precisely identify the version
	with the functionality of a previously registered "common"	numbers of relevant tables to use.
	collation.

	o The collation name fails to precisely identify the version numbers
	of relevant tables to use.

	o The registration fails to meet one of the "MUST" requirements in	o The registration fails to meet one of the "MUST" requirements in
	Section 4.	Section 4.

		o The collation identifier fails to meet the syntax in Section 3.
	o The collation name fails to meet the syntax in Section 3.

	o The collation specification referenced in the registration is	o The collation specification referenced in the registration is
	vague or has optional features without a clear behavior specified.	vague or has optional features without a clear behavior specified.


	o The referenced specification does not adequately address security	o The referenced specification does not adequately address security
	considerations specific to that collation.	considerations specific to that collation.

		o The registration's operations are needlessly different from those
		of traditional operations.
		o The registration's XML is needlessly different from that of
		already registered collations.

		9. Initial Collations

		This section describes an initial set of collations for the collation
		registry.

		9.1. ASCII Numeric Collation

		9.1.1. ASCII Numeric Collation Description

		The "i;ascii-numeric" collation is a simple collation intended for
		use with arbitrary sized unsigned decimal integer numbers stored as
		octet strings. US-ASCII digits (0x30 to 0x39) represent digits of
		the numbers. Before converting from string to integer, the input
		string is truncated at the first non-digit character. All input is
		valid; strings which do not start with a digit represent positive
		infinity.

		The collation supports equality and ordering, but does not support
		the substring operation.

		The equality operation returns "match" if the two strings represent
		the same number (ie. leading zeroes and trailing nondigits are
		disregarded) and "no-match" if the two strings represent different
		numbers.

		The ordering operation returns "less" if the first string represents
		a smaller number than the second, "equal" if they represent the same
		number, and "greater" if the first string represents a larger number
		than the second.

		Some examples: "0" is less than "1", and "1" is less than
		"4294967298". "4294967298", "04294967298" and "4294967298b" are all
		equal. "04294967298" is less than "". "", "x" and "y" are equal.


	11. Security Considerations	9.1.2. ASCII Numeric Collation Registration

		<?xml version='1.0'?>
		<!DOCTYPE collation SYSTEM 'collationreg.dtd'>
		<collation rfc="XXXX" scope="other" intendedUse="limited">
		<identifier>i;ascii-numeric</identifier>
		<title>ASCII Numeric</title>
		<operations>equality order</operations>
		<specification>RFC XXXX</specification>
		<owner>IETF</owner>
		<submitter>chris.newman@sun.com<submitter>
		</collation>

		9.2. ASCII Casemap Collation

		9.2.1. ASCII Casemap Collation Description

		The "i;ascii-casemap" collation is a simple collation which operates
		on octet strings and treats US-ASCII letters case-insensitively. It
		provides equality, substring and ordering operations. All input is
		valid.

		Its equality, ordering and substring operations are as for i;octet,
		except that first, the lower-case letters (octet values 97-122) in
		each input string are changed to upper case (octet values 65-90).

		Care should be taken when using OS-supplied functions to implement
		this collation as it is not locale sensitive. Functions such as
		strcasecmp and toupper are sometimes locale sensitive and may
		inappropriately map lower-case letters other than a-z to upper case.

		The i;ascii-casemap collation is well suited to to use with many
		internet protocols and computer languages. Use with natural language
		is often inappropriate: even though the collation apparently supports
		languages such as Italian and English, in real-world use it tends to
		stumble over words such as "naive", names such as "Llwyd", people and
		place names containing non-ASCII, euro and pound sterling symbols,
		quotation marks, dashes/hyphens, etc.

		9.2.2. ASCII Casemap Collation Registration

		<?xml version='1.0'?>
		<!DOCTYPE collation SYSTEM 'collationreg.dtd'>
		<collation rfc="XXXX" scope="local" intendedUse="common">
		<identifier>i;ascii-casemap</identifier>
		<title>ASCII Casemap</title>
		<operations>equality order substring</operations>
		<specification>RFC XXXX</specification>
		<owner>IETF</owner>
		<submitter>chris.newman@sun.com<submitter>
		</collation>

		9.3. Nameprep Collation

		9.3.1. Nameprep Collation Description

		The "i;nameprep;v=1;uv=3.2" collation is an implementation of the
		nameprep [7] specification based on normalization tables from Unicode
		version 3.2. This collation applies the nameprep canonicalization
		function to both input strings and then returns the result of the
		i;octet collation on the canonicalized strings. While this collation
		offers all three operations, the ordering operation it provides is
		inadequate for use by the majority of the world.

		Version number 1 is applied to nameprep as specified in RFC 3491. If
		the nameprep specification is revised without any changes that would
		produce different results when given the same pair of input octet
		strings, then the version number need not be changed.

		The table numbers for tables used by nameprep are as follows:

		+--------------+-----------------------+
		\| Table Number \| Table Name \|
		+--------------+-----------------------+
		\| 1 \| UnicodeData-3.2.0.txt \|
		\| 2 \| Table B.1 \|
		\| 3 \| Table B.2 \|
		\| 4 \| Table C.1.2 \|
		\| 5 \| Table C.2.2 \|
		\| 6 \| Table C.3 \|
		\| 7 \| Table C.4 \|
		\| 8 \| Table C.5 \|
		\| 9 \| Table C.6 \|
		\| 10 \| Table C.7 \|
		\| 11 \| Table C.8 \|
		\| 12 \| Table C.9 \|
		+--------------+-----------------------+

		9.3.2. Nameprep Collation Registration

		<?xml version='1.0'?>
		<!DOCTYPE collation SYSTEM 'collationreg.dtd'>
		<collation rfc="XXXX" scope="i18n" intendedUse="common">
		<identifier>i;nameprep;v=1;uv=3.2</identifier>
		<title>Nameprep</title>
		<operations>equality order substring</operations>
		<specification>RFC XXXX</specification>
		<owner>IETF</owner>
		<submitter>chris.newman@sun.com<submitter>
		<version>1</version>
		</collation>

		9.4. Octet Collation

		9.4.1. Octet Collation Description

		The "i;octet" collation is a simple and fast collation intended for
		use on binary octet strings rather than on character data. Protocols
		that want to make this collation available have to do so by
		explicitly allowing it. If not explicitly allowed, it MUST NOT be
		used. It never returns an "undefined" result. It provides equality,
		substring and ordering operations.

		The ordering algorithm is as follows:
		1. If both strings are the empty string, return the result "equal".
		2. If the first string is empty and the second is not, return the
		result "less".
		3. If the second string is empty and the first is not, return the
		result "greater".
		4. If both strings begin with the same octet value, remove the first
		octet from both strings and repeat this algorithm from step 1.
		5. If the unsigned value (0 to 255) of the first octet of the first
		string is less than the unsigned value of the first octet of the
		second string, then return "less".
		6. If this step is reached, return "greater".

		This algorithm is roughly equivalent to the C library function memcmp
		with appropriate length checks added.

		The matching operation returns "match" if the sorting algorithm would
		return "equal". Otherwise the matching operation returns "no-match".

		The substring operation returns "match" if the first string is the
		empty string, or if there exists a substring of the second string of
		length equal to the length of the first string which would result in
		a "match" result from the equality function. Otherwise the substring
		operation returns "no-match".

		9.4.2. Octet Collation Registration

		This collation is defined with intendedUse="limited" because it can
		only be used by protocols that explicitly allow it.

		<?xml version='1.0'?>
		<!DOCTYPE collation SYSTEM 'collationreg.dtd'>
		<collation rfc="XXXX" scope="i18n" intendedUse="limited">
		<identifier>i;octet</identifier>
		<title>Octet</title>
		<operations>equality order substring</operations>
		<specification>RFC XXXX</specification>
		<owner>IETF</owner>
		<submitter>chris.newman@sun.com<submitter>
		</collation>

		10. IANA Considerations

		Section 7 defines how to register collations with IANA. Section 9
		defines a list of predefined collations, which should be registered
		when this document is approved and published as an RFC.

		11. Security Considerations

	Collations will normally be used with UTF-8 strings. Thus the	Collations will normally be used with UTF-8 strings. Thus the

	security considerations for UTF-8 [3] and stringprep [6] also apply	security considerations for UTF-8 [3], stringprep [6] and Unicode
	and are normative to this specification.	TR-36 [9] also apply and are normative to this specification.


	12. Open Issues	12. Acknowledgements


	1. Is any Nameprep processing appropriate for the basic collation?	The authors want to thank all who have contributed to this document,
	Because a result of "0" from an ordering algorithm is	including at least John Cowan, Dave Cridland, Mark Davis, Lisa
	undesirable, much of the nameprep processing is inappropriate.	Dusseault, Frank Ellermann, Philip Guenther, Tony Hansen, Kjetil
	Furthermore, a result of "error" which is important for nameprep	Torgrim Homme, Michael Kay, Alexey Melnikov, Jim Melton and Abhijit
	is generally inappropriate as an internal result in an ordering	Menon-Sen.
	algorithm since it makes the results less intuitive. The sort
	key table also eliminates most problematic characters from	13. Open Issues
	consideration if the appropriate collation modifier is used.
	Finally, exact compatibility with the Unicode Collation Algorithm	When converting this to an RFC, several things must be done: Martin
	is deemed desirable by the author, as even the smallest variation	Duerst's name request, checking for unfortunate page breaks, adding a
	may require implementation of largely duplicate code. However,	note to the RFC editor to possibly replace the 3066 reference,
	this decision is outside my expertise, so I welcome alternate	checking the SP SP "1" SP SP string for correctness.
	viewpoints.
		Why no comments from anyone in the second half of the alphabet?
	2. The ICU implementation of the UCA algorithm includes additional
	algorithmic customizations such as the ability to be	14. Change Log
	case-sensitive while at the same time being insensitive to
	accents. Should these customizations be added to this	14.1. Changes From -12
	specification?	1. Remove i;basic, to publish it as a separate RFC. Many documents
		are held up by this document, and this document is only help up
	3. Should a format for customization data for the basic collation be	by i;basic.
	defined so that disconnected clients might have the option of	2. Get rid of all the typoes I could find.
	downloading that information?	3. Specifically note that the "same" substring match need not always
		be returned in each of its guises.
	4. Need to deal with the concept of "maybe" or "indeterminate"
	results from matching or ordering. See what LDAP does as an	14.2. Changes From -11
	example.	1. Remove the DTD. Permit well-considered extension of the XML.
		Enable the designated expert to block registrations due to
		inappropriate or overly aggressive extension.
		2. Rename collation names to collation identifiers. Having both
		names and titles wasn't good.
		3. Removed some open issues after trying to edit, and deciding that
		the existing text was good.
		4. Note that in Sieve, invalid strings sort after valid ones.
		5. Make i;ascii-numeric as in RFC2244. The task of this document is
		to establish the registry, not change existing collations.

		14.3. Changes From -10
		1. Updated contact details for Martin Duerst.
		2. Various textual improvements.
		3. The registration's file name now has a mandatory .xml extension.
		4. Removed binding MUST for Sieve; it's more appropriate to put that
		in 3028bis.
		5. Syntax fix in registration example.
		6. When there are multiple specifications, they now act in concert,
		so it's possible to have e.g. a main specification and multiple
		locale-specific supplements. It is not possible to name multiple
		locations for the same specification any more. That'll return as
		a comment feature.
		7. Hopefully clearer exposition of i;ascii-casemap.
		8. The ban on registering octet-based collations is lifted. One
		hopes that the collation mailing list will present a suitable
		threshold - not too high, not too low.
		9. The DTD is published where IE can see it while looking at the
		registrations.

		14.4. Changes From -09
		1. Rename "error" to "undefined", as suggested by Mark Davis. The
		new name makes for nicer prose IMO.

		2. 7b=7 according to i;ascii-numeric. ACAP/Sieve need it.
		3. Clarified that even though the collation specification returns a
		list of substrings, the protocol/server need not use all of that
		information. (As indeed IMAP SEARCH does not.)
		4. Registrations go directly to the collation list _and_ to the
		IANA, not to the IANA and from there forwarded to designated
		expert.
		5. Added an acknowledgements list and populated it with a quick grep
		from my mailbox and memory. Surely incomplete.
		6. Noted that in sieve, "no-match" and "undefined" must be treated
		in the same way by the engine.
		7. Finish the rename from canonical to sort key.
		8. Don't fall back to i;octet from any other collation. Return
		undefined instead. Note that protocols may fall back to i;octet
		to provide total ordering, if necessary.
		9. Call the things operations everywhere, not operators/operations.

		14.5. Changes From -08
		1. i;ascii-casemap instead of en;ascii-casemap.
		2. UCA v 14. Changing to "latest version of UCA" was suggested,
		but rejected since IETF standards reference stable
		specifications, and "latest" is a moving target.
		3. Removed all text on multi-valued attributes. Can be added once
		there is a concrete need for it, either in an update to this
		document or in the protocol that needs it.
		4. "Collations MUST specify the canonicalization". Well, the UCA
		doesn't, so I changed that to a MAY.
		5. Add some text explaining why one might want to download tables.
		6. Changed the remaining instances of "canonicalization" to talk
		about sort keys. Added a note that a collation's sort key need
		not be valid input to the same collation.
		7. Reserve the word "default" and use it to name a protocol's
		default collation, provided that protocol has a default
		collation. In earlier versions of the draft, "*" was used to
		name the default collation, but "*" also was implicitly defined
		as the most general collation available.
		8. Reinstate the different-length example of substring match.
		Explain what an overlapping match is, by the canonical example.
		9. Avoid the word "contain" when talking about substring matches.
		Fewer terms is better.
		10. Until -07, both a collation and equality/substring/sort was
		called functions. In -07, the trio was renamed as operations.
		Now, the DTD is updated to match.
		11. Appeals go to the Apps AD before the general AD, as suggested by
		Spencer Dawkins.

		14.6. Changes From -06
		1. Clarified equality and identity: equality is as defined by a
		collation, identity is stronger.
		2. Added reference to
		http://www.unicode.org/reports/tr10/#Searching.
		3. Don't describe sort keys as a canonical representation of the
		string.
		4. Permit disconnected clients to use wildcards. (A disconnected
		client has to resolve the wildcard itself, in the same way that a
		server would.)
		5. Change collation-wild to have the same length limit as collation.
		6. Change to use "less" instead of "-1", etc., and specify that it's
		just phrasing, not specification.
		7. Don't describe the equality, substring and ordering operations as
		functions. The definition of collation uses the word function
		about the collation itself. A function that has three functions?
		Something has to give.
		8. Strike a requirement that selecting '*' is the same as not
		selecting any collation. It restricted the protocol's default
		too much. Existing code wasn't listening.
		9. Left out the canonicalization/sort keys.

		14.7. Changes From -05
		1. Added definitions of client, server and protocol, and prose to
		specify that while the IANA registrations of collations are
		written in terms octet strings, implementations may do it
		differently.
		2. Changed the wording for ascii-numeric to treat the numbers as
		numbers, etc.
		3. Added explicit property requirements for the three functions,
		e.g. that equality be symmetric. Added requirements that the
		three functions be consistent, and that if any operations are
		present, equality must be (needed for consistency).
		4. Random editing, e.g. changing 'numbers' for ascii-numeric to
		'integer numbers'.
		5. Gave IMAP/SORT/COMPARATOR the same grandfather treatment as ACAP
		and SIEVE.

		14.8. Changes From -04

		Grammar and clarity changes only. One (weak) example added. No
		substantive changes.

		14.9. Changes From -03

		(This does not include all changes made.)
		1. Checked and resolved most issues marked 'check whether this is
		true' or similar.
		2. Resolved nameprep issue: No.
		3. Removed NULL for compatibility with existing collations (IMAP
		SORT, Sieve).
		4. There can be multiple owners and submitters. Say how.
		5. Added a requirement that common collations must now be
		interoperable. Insufficiently detailed specs cannot be "common".
		6. Added a guideline that the operations provided by new collations
		should be reminiscent of similar operations on existing
		collations.

		14.10. Changes From -02

		1. Changed from data being octet sequences (in UTF-8) to data being
		character sequences (with octet collation as an exception).
		2. Made XML format description much more structured.
		3. Changed <submittor> to <submitter>, because this spelling is much
		more common.
		4. Defined 'protocol' to include query languages.
		5. Reorganized document, in particular IANA considerations section
		(which newly is just a list of pointers).
		6. Added subsections, and a 'Structure of this Document' section.
		7. Updated references.
		8. Created a 'Change Log' chapter, with sections for each draft.
		9. Reduced 'Open issues' section, open issues are now maintained at
		http://www.w3.org/2004/08/ietf-collation.


	13. Changes From -00	14.11. Changes From -01

		Add IANA comment to open issues. Otherwise this is just a re-publish
		to keep the document alive.

		14.12. Changes From -00

	1. Replaced the term comparator with collation. While comparator is	1. Replaced the term comparator with collation. While comparator is
	somewhat more precise because these abstract functions are used	somewhat more precise because these abstract functions are used
	for matching as well as ordering, collation is the term used by	for matching as well as ordering, collation is the term used by
	other parts of the industry. Thus I have changed the name to	other parts of the industry. Thus I have changed the name to
	collation for consistency.	collation for consistency.


	2. Remove all modifiers to the basic collation except for the	2. Remove all modifiers to the basic collation except for the
	customization and the match rules. The other behavior	customization and the match rules. The other behavior
	modifications can be specified in a customization of the	modifications can be specified in a customization of the
	collation.	collation.


	3. Use ";" instead of "-" as delimiter between parameters to make	3. Use ";" instead of "-" as delimiter between parameters to make
	names more URL-ish.	names more URL-ish.

	4. Add URL form for comparator reference.	4. Add URL form for comparator reference.


	5. Switched registration template to use XML document.	5. Switched registration template to use XML document.


	6. Added a number of useful registration template elements related	6. Added a number of useful registration template elements related
	to the Unicode Collation Algorithm.	to the Unicode Collation Algorithm.


	7. Switched language from "custom" to "tailor" to match UCA language	7. Switched language from "custom" to "tailor" to match UCA language
	for tailoring of the collation algorithm.	for tailoring of the collation algorithm.


	Normative References	15. References

		15.1. Normative References

	[1] Bradner, S., "Key words for use in RFCs to Indicate Requirement	[1] Bradner, S., "Key words for use in RFCs to Indicate Requirement
	Levels", BCP 14, RFC 2119, March 1997.	Levels", BCP 14, RFC 2119, March 1997.

	[2] Crocker, D. and P. Overell, "Augmented BNF for Syntax	[2] Crocker, D. and P. Overell, "Augmented BNF for Syntax

	Specifications: ABNF", RFC 2234, November 1997.	Specifications: ABNF", RFC 4234, October 2005.


	[3] Yergeau, F., "UTF-8, a transformation format of ISO 10646", RFC	[3] Yergeau, F., "UTF-8, a transformation format of ISO 10646",
	2279, January 1998.	STD 63, RFC 3629, November 2003.


	[4] Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform Resource	[4] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
	Identifiers (URI): Generic Syntax", RFC 2396, August 1998.	Resource Identifier (URI): Generic Syntax", RFC 3986,
		January 2005.


	[5] Alvestrand, H., "Tags for the Identification of Languages", BCP	[5] Alvestrand, H., "Tags for the Identification of Languages",
	47, RFC 3066, January 2001.	BCP 47, RFC 3066, January 2001.

	[6] Hoffman, P. and M. Blanchet, "Preparation of Internationalized	[6] Hoffman, P. and M. Blanchet, "Preparation of Internationalized
	Strings ("stringprep")", RFC 3454, December 2002.	Strings ("stringprep")", RFC 3454, December 2002.

	[7] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep Profile for	[7] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep Profile for
	Internationalized Domain Names (IDN)", RFC 3491, March 2003.	Internationalized Domain Names (IDN)", RFC 3491, March 2003.

	[8] Davis, M. and K. Whistler, "Unicode Collation Algorithm version	[8] Davis, M. and K. Whistler, "Unicode Collation Algorithm version

	9", July 2002, <http://www.unicode.org/reports/tr10/	14", May 2005,
	tr10-9.html>.	<http://www.unicode.org/reports/tr10/tr10-14.html>.

		[9] Davis, M. and M. Suignard, "Unicode Security Considerations",
		February 2006, <http://www.unicode.org/reports/tr36/>.


	Informative References	15.2. Informative References


	[9] Freed, N. and N. Borenstein, "Multipurpose Internet Mail	[10] Freed, N. and N. Borenstein, "Multipurpose Internet Mail
	Extensions (MIME) Part One: Format of Internet Message Bodies",	Extensions (MIME) Part One: Format of Internet Message Bodies",
	RFC 2045, November 1996.	RFC 2045, November 1996.


	[10] Myers, J., "Simple Authentication and Security Layer (SASL)",	[11] Myers, J., "Simple Authentication and Security Layer (SASL)",
	RFC 2222, October 1997.	RFC 2222, October 1997.


	[11] Newman, C. and J. Myers, "ACAP -- Application Configuration	[12] Newman, C. and J. Myers, "ACAP -- Application Configuration
	Access Protocol", RFC 2244, November 1997.	Access Protocol", RFC 2244, November 1997.


	[12] Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA
	Considerations Section in RFCs", BCP 26, RFC 2434, October
	1998.

	[13] Resnick, P., "Internet Message Format", RFC 2822, April 2001.	[13] Resnick, P., "Internet Message Format", RFC 2822, April 2001.

	[14] Freed, N. and J. Postel, "IANA Charset Registration	[14] Freed, N. and J. Postel, "IANA Charset Registration
	Procedures", BCP 19, RFC 2978, October 2000.	Procedures", BCP 19, RFC 2978, October 2000.

	[15] Showalter, T., "Sieve: A Mail Filtering Language", RFC 3028,	[15] Showalter, T., "Sieve: A Mail Filtering Language", RFC 3028,
	January 2001.	January 2001.


	URIs	[16] Crispin, M., "Internet Message Access Protocol - Version
		4rev1", RFC 3501, March 2003.


	[16] <http://www.unicode.org/Public/3.2-Update/	[17] Crispin, M. and K. Murchison, "Internet Message Access Protocol
	UnicodeData-3.2.0.txt>	- Sort and Thread Extensions", draft-ietf-imapext-sort-17.txt
		(work in progress), May 2004.

		[18] Newman, C. and A. Gulbrandsen, "Internet Message Access
		Protocol Internationalization", draft-ietf-imapext-i18n-06.txt
		(work in progress), January 2006.


	[17] <http://www.unicode.org/Public/3.2-Update/PropList-3.2.0.txt>	Authors' Addresses

	[18] <http://www.unicode.org/reports/tr10/allkeys-3.1.1.txt>

	Author's Address

	Chris Newman	Chris Newman
	Sun Microsystems	Sun Microsystems
	1050 Lakes Drive	1050 Lakes Drive
	West Covina, CA 91790	West Covina, CA 91790
	US	US


	EMail: chris.newman@sun.com	Email: chris.newman@sun.com

		Martin Duerst (Note: Please write "Duerst" with u-umlaut wherever possib
		le, for example as "Dürst" in XML and HTML.)
		Aoyama Gakuin University
		5-10-1 Fuchinobe
		Sagamihara, Kanagawa 229-8558
		Japan

		Phone: +81 42 759 6329
		Fax: +81 42 759 6495
		Email: mailto:duerst@it.aoyama.ac.jp
		URI: http://www.sw.it.aoyama.ac.jp/D%C3%BCrst/

		Arnt Gulbrandsen
		Oryx Mail Systems GmbH
		Schweppermannstr. 8
		Munich 81671
		Germany

		Phone: +49 89 4502 9757
		Fax: +49 89 4502 9758
		Email: mailto:arnt@oryx.com
		URI: http://www.oryx.com/arnt/

	Intellectual Property Statement	Intellectual Property Statement

	The IETF takes no position regarding the validity or scope of any	The IETF takes no position regarding the validity or scope of any

	intellectual property or other rights that might be claimed to	Intellectual Property Rights or other rights that might be claimed to
	pertain to the implementation or use of the technology described in	pertain to the implementation or use of the technology described in
	this document or the extent to which any license under such rights	this document or the extent to which any license under such rights

	might or might not be available; neither does it represent that it	might or might not be available; nor does it represent that it has
	has made any effort to identify any such rights. Information on the	made any independent effort to identify any such rights. Information
	IETF's procedures with respect to rights in standards-track and	on the procedures with respect to rights in RFC documents can be
	standards-related documentation can be found in BCP-11. Copies of	found in BCP 78 and BCP 79.
	claims of rights made available for publication and any assurances of
	licenses to be made available, or the result of an attempt made to	Copies of IPR disclosures made to the IETF Secretariat and any
	obtain a general license or permission for the use of such	assurances of licenses to be made available, or the result of an
	proprietary rights by implementors or users of this specification can	attempt made to obtain a general license or permission for the use of
	be obtained from the IETF Secretariat.	such proprietary rights by implementers or users of this
		specification can be obtained from the IETF on-line IPR repository at
		http://www.ietf.org/ipr.

	The IETF invites any interested party to bring to its attention any	The IETF invites any interested party to bring to its attention any
	copyrights, patents or patent applications, or other proprietary	copyrights, patents or patent applications, or other proprietary

	rights which may cover technology that may be required to practice	rights that may cover technology that may be required to implement
	this standard. Please address the information to the IETF Executive	this standard. Please address the information to the IETF at
	Director.	ietf-ipr@ietf.org.

	Full Copyright Statement	Disclaimer of Validity

	Copyright (C) The Internet Society (2003). All Rights Reserved.	This document and the information contained herein are provided on an
		"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
	This document and translations of it may be copied and furnished to	OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
	others, and derivative works that comment on or otherwise explain it	ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
	or assist in its implementation may be prepared, copied, published	INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
	and distributed, in whole or in part, without restriction of any	INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
	kind, provided that the above copyright notice and this paragraph are	WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
	included on all such copies and derivative works. However, this
	document itself may not be modified in any way, such as by removing	Copyright Statement
	the copyright notice or references to the Internet Society or other
	Internet organizations, except as needed for the purpose of	Copyright (C) The Internet Society (2006). This document is subject
	developing Internet standards in which case the procedures for	to the rights, licenses and restrictions contained in BCP 78, and
	copyrights defined in the Internet Standards process must be	except as set forth therein, the authors retain all their rights.
	followed, or as required to translate it into languages other than
	English.

	The limited permissions granted above are perpetual and will not be
	revoked by the Internet Society or its successors or assignees.

	This document and the information contained herein is provided on an
	"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
	TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
	BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
	HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
	MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

	Acknowledgment	Acknowledgment

	Funding for the RFC Editor function is currently provided by the	Funding for the RFC Editor function is currently provided by the
	Internet Society.	Internet Society.

End of changes. 130 change blocks.
	819 lines changed or deleted	1082 lines changed or added
This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/