···8080opam exec -- dune build @doc
8181```
82828383+## Technical Standards
8484+8585+This library is built on the following Internet standards:
8686+8787+- **[RFC 1034](https://datatracker.ietf.org/doc/html/rfc1034)** - Domain Names: Concepts and Facilities
8888+- **[RFC 1035](https://datatracker.ietf.org/doc/html/rfc1035)** - Domain Names: Implementation and Specification
8989+- **[RFC 3492](https://datatracker.ietf.org/doc/html/rfc3492)** - Punycode: A Bootstring encoding of Unicode for IDNA
9090+- **[RFC 5890](https://datatracker.ietf.org/doc/html/rfc5890)** - IDNA: Definitions and Document Framework
9191+- **[RFC 5891](https://datatracker.ietf.org/doc/html/rfc5891)** - IDNA: Protocol
9292+9393+RFC specifications are available in the `spec/` directory for reference.
9494+8395## License
84968597ISC
+46-5
lib/publicsuffix.mli
···2020 For example, for the domain [www.example.com], the public suffix is [.com]
2121 and the registrable domain is [example.com].
22222323+ Domain names follow the specifications in {{:https://datatracker.ietf.org/doc/html/rfc1034}RFC 1034}
2424+ and {{:https://datatracker.ietf.org/doc/html/rfc1035}RFC 1035}, which define
2525+ the Domain Name System concepts and implementation.
2626+2327 {1 Sections}
24282529 The PSL is divided into two sections:
···6771 {1 Internationalized Domain Names}
68726973 The library handles internationalized domain names (IDN) by converting them
7070- to Punycode (ASCII-compatible encoding) before lookup. Both Unicode and
7171- Punycode input are accepted:
7474+ to Punycode (ASCII-compatible encoding) before lookup, following the IDNA2008
7575+ protocol defined in {{:https://datatracker.ietf.org/doc/html/rfc5890}RFC 5890}
7676+ (IDNA Definitions) and {{:https://datatracker.ietf.org/doc/html/rfc5891}RFC 5891}
7777+ (IDNA Protocol).
7878+7979+ Punycode encoding, specified in {{:https://datatracker.ietf.org/doc/html/rfc3492}RFC 3492},
8080+ uniquely and reversibly transforms Unicode strings into ASCII-compatible
8181+ strings using the "xn--" prefix (ACE prefix). Both Unicode and Punycode
8282+ input are accepted:
72837384 {[
7485 Publicsuffix.registrable_domain psl "www.食狮.com.cn"
···90101 Publicsuffix.public_suffix psl "example.com."
91102 (* Returns: Ok "com." *)
92103 ]}
104104+105105+ {1 References}
106106+107107+ This library implementation is based on the following specifications:
108108+109109+ {ul
110110+ {- {{:https://publicsuffix.org/list/} Public Suffix List Specification} - The algorithm and list format}
111111+ {- {{:https://datatracker.ietf.org/doc/html/rfc1034}RFC 1034} - Domain Names: Concepts and Facilities}
112112+ {- {{:https://datatracker.ietf.org/doc/html/rfc1035}RFC 1035} - Domain Names: Implementation and Specification}
113113+ {- {{:https://datatracker.ietf.org/doc/html/rfc3492}RFC 3492} - Punycode: A Bootstring encoding of Unicode for IDNA}
114114+ {- {{:https://datatracker.ietf.org/doc/html/rfc5890}RFC 5890} - Internationalized Domain Names for Applications (IDNA): Definitions}
115115+ {- {{:https://datatracker.ietf.org/doc/html/rfc5891}RFC 5891} - Internationalized Domain Names in Applications (IDNA): Protocol}}
93116*)
9411795118(** {1 Types} *)
···109132 | Empty_domain
110133 (** The input domain was empty *)
111134 | Invalid_domain of string
112112- (** The domain could not be parsed as a valid domain name *)
135135+ (** The domain could not be parsed as a valid domain name.
136136+ Domain names must conform to the syntax specified in
137137+ {{:https://datatracker.ietf.org/doc/html/rfc1035}RFC 1035}. *)
113138 | Leading_dot
114114- (** The domain has a leading dot (e.g., [.example.com]) *)
139139+ (** The domain has a leading dot (e.g., [.example.com]).
140140+ Per {{:https://datatracker.ietf.org/doc/html/rfc1035}RFC 1035},
141141+ domain names should not have leading dots. *)
115142 | Punycode_error of string
116116- (** Failed to convert internationalized domain to Punycode *)
143143+ (** Failed to convert internationalized domain to Punycode encoding.
144144+ See {{:https://datatracker.ietf.org/doc/html/rfc3492}RFC 3492}
145145+ for Punycode encoding requirements and
146146+ {{:https://datatracker.ietf.org/doc/html/rfc5891}RFC 5891}
147147+ for IDNA protocol requirements. *)
117148 | No_public_suffix
118149 (** The domain has no public suffix (should not happen with valid domains) *)
119150 | Domain_is_public_suffix
···140171 - Exception rules ([!]) take priority over all other rules
141172 - If no rules match, the implicit [*] rule applies (returns the TLD)
142173174174+ Domain names are processed according to
175175+ {{:https://datatracker.ietf.org/doc/html/rfc1035}RFC 1035} syntax.
176176+ Internationalized domain names (IDN) are automatically converted to
177177+ Punycode per {{:https://datatracker.ietf.org/doc/html/rfc3492}RFC 3492}
178178+ before matching.
179179+143180 @param t The PSL instance
144181 @param domain The domain name to query (Unicode or Punycode)
145182 @return [Ok suffix] with the public suffix, or [Error e] on failure
···166203167204 The registrable domain is the public suffix plus one additional label.
168205 This is the highest-level domain that can be registered by a user.
206206+207207+ Domain labels follow the naming restrictions specified in
208208+ {{:https://datatracker.ietf.org/doc/html/rfc1035}RFC 1035}. Internationalized
209209+ domain names are handled per {{:https://datatracker.ietf.org/doc/html/rfc5891}RFC 5891}.
169210170211 @param t The PSL instance
171212 @param domain The domain name to query
+3077
spec/rfc1034.txt
···11+Network Working Group P. Mockapetris
22+Request for Comments: 1034 ISI
33+Obsoletes: RFCs 882, 883, 973 November 1987
44+55+66+ DOMAIN NAMES - CONCEPTS AND FACILITIES
77+88+99+1010+1. STATUS OF THIS MEMO
1111+1212+This RFC is an introduction to the Domain Name System (DNS), and omits
1313+many details which can be found in a companion RFC, "Domain Names -
1414+Implementation and Specification" [RFC-1035]. That RFC assumes that the
1515+reader is familiar with the concepts discussed in this memo.
1616+1717+A subset of DNS functions and data types constitute an official
1818+protocol. The official protocol includes standard queries and their
1919+responses and most of the Internet class data formats (e.g., host
2020+addresses).
2121+2222+However, the domain system is intentionally extensible. Researchers are
2323+continuously proposing, implementing and experimenting with new data
2424+types, query types, classes, functions, etc. Thus while the components
2525+of the official protocol are expected to stay essentially unchanged and
2626+operate as a production service, experimental behavior should always be
2727+expected in extensions beyond the official protocol. Experimental or
2828+obsolete features are clearly marked in these RFCs, and such information
2929+should be used with caution.
3030+3131+The reader is especially cautioned not to depend on the values which
3232+appear in examples to be current or complete, since their purpose is
3333+primarily pedagogical. Distribution of this memo is unlimited.
3434+3535+2. INTRODUCTION
3636+3737+This RFC introduces domain style names, their use for Internet mail and
3838+host address support, and the protocols and servers used to implement
3939+domain name facilities.
4040+4141+2.1. The history of domain names
4242+4343+The impetus for the development of the domain system was growth in the
4444+Internet:
4545+4646+ - Host name to address mappings were maintained by the Network
4747+ Information Center (NIC) in a single file (HOSTS.TXT) which
4848+ was FTPed by all hosts [RFC-952, RFC-953]. The total network
4949+5050+5151+5252+Mockapetris [Page 1]
5353+5454+RFC 1034 Domain Concepts and Facilities November 1987
5555+5656+5757+ bandwidth consumed in distributing a new version by this
5858+ scheme is proportional to the square of the number of hosts in
5959+ the network, and even when multiple levels of FTP are used,
6060+ the outgoing FTP load on the NIC host is considerable.
6161+ Explosive growth in the number of hosts didn't bode well for
6262+ the future.
6363+6464+ - The network population was also changing in character. The
6565+ timeshared hosts that made up the original ARPANET were being
6666+ replaced with local networks of workstations. Local
6767+ organizations were administering their own names and
6868+ addresses, but had to wait for the NIC to change HOSTS.TXT to
6969+ make changes visible to the Internet at large. Organizations
7070+ also wanted some local structure on the name space.
7171+7272+ - The applications on the Internet were getting more
7373+ sophisticated and creating a need for general purpose name
7474+ service.
7575+7676+7777+The result was several ideas about name spaces and their management
7878+[IEN-116, RFC-799, RFC-819, RFC-830]. The proposals varied, but a
7979+common thread was the idea of a hierarchical name space, with the
8080+hierarchy roughly corresponding to organizational structure, and names
8181+using "." as the character to mark the boundary between hierarchy
8282+levels. A design using a distributed database and generalized resources
8383+was described in [RFC-882, RFC-883]. Based on experience with several
8484+implementations, the system evolved into the scheme described in this
8585+memo.
8686+8787+The terms "domain" or "domain name" are used in many contexts beyond the
8888+DNS described here. Very often, the term domain name is used to refer
8989+to a name with structure indicated by dots, but no relation to the DNS.
9090+This is particularly true in mail addressing [Quarterman 86].
9191+9292+2.2. DNS design goals
9393+9494+The design goals of the DNS influence its structure. They are:
9595+9696+ - The primary goal is a consistent name space which will be used
9797+ for referring to resources. In order to avoid the problems
9898+ caused by ad hoc encodings, names should not be required to
9999+ contain network identifiers, addresses, routes, or similar
100100+ information as part of the name.
101101+102102+ - The sheer size of the database and frequency of updates
103103+ suggest that it must be maintained in a distributed manner,
104104+ with local caching to improve performance. Approaches that
105105+106106+107107+108108+Mockapetris [Page 2]
109109+110110+RFC 1034 Domain Concepts and Facilities November 1987
111111+112112+113113+ attempt to collect a consistent copy of the entire database
114114+ will become more and more expensive and difficult, and hence
115115+ should be avoided. The same principle holds for the structure
116116+ of the name space, and in particular mechanisms for creating
117117+ and deleting names; these should also be distributed.
118118+119119+ - Where there tradeoffs between the cost of acquiring data, the
120120+ speed of updates, and the accuracy of caches, the source of
121121+ the data should control the tradeoff.
122122+123123+ - The costs of implementing such a facility dictate that it be
124124+ generally useful, and not restricted to a single application.
125125+ We should be able to use names to retrieve host addresses,
126126+ mailbox data, and other as yet undetermined information. All
127127+ data associated with a name is tagged with a type, and queries
128128+ can be limited to a single type.
129129+130130+ - Because we want the name space to be useful in dissimilar
131131+ networks and applications, we provide the ability to use the
132132+ same name space with different protocol families or
133133+ management. For example, host address formats differ between
134134+ protocols, though all protocols have the notion of address.
135135+ The DNS tags all data with a class as well as the type, so
136136+ that we can allow parallel use of different formats for data
137137+ of type address.
138138+139139+ - We want name server transactions to be independent of the
140140+ communications system that carries them. Some systems may
141141+ wish to use datagrams for queries and responses, and only
142142+ establish virtual circuits for transactions that need the
143143+ reliability (e.g., database updates, long transactions); other
144144+ systems will use virtual circuits exclusively.
145145+146146+ - The system should be useful across a wide spectrum of host
147147+ capabilities. Both personal computers and large timeshared
148148+ hosts should be able to use the system, though perhaps in
149149+ different ways.
150150+151151+2.3. Assumptions about usage
152152+153153+The organization of the domain system derives from some assumptions
154154+about the needs and usage patterns of its user community and is designed
155155+to avoid many of the the complicated problems found in general purpose
156156+database systems.
157157+158158+The assumptions are:
159159+160160+ - The size of the total database will initially be proportional
161161+162162+163163+164164+Mockapetris [Page 3]
165165+166166+RFC 1034 Domain Concepts and Facilities November 1987
167167+168168+169169+ to the number of hosts using the system, but will eventually
170170+ grow to be proportional to the number of users on those hosts
171171+ as mailboxes and other information are added to the domain
172172+ system.
173173+174174+ - Most of the data in the system will change very slowly (e.g.,
175175+ mailbox bindings, host addresses), but that the system should
176176+ be able to deal with subsets that change more rapidly (on the
177177+ order of seconds or minutes).
178178+179179+ - The administrative boundaries used to distribute
180180+ responsibility for the database will usually correspond to
181181+ organizations that have one or more hosts. Each organization
182182+ that has responsibility for a particular set of domains will
183183+ provide redundant name servers, either on the organization's
184184+ own hosts or other hosts that the organization arranges to
185185+ use.
186186+187187+ - Clients of the domain system should be able to identify
188188+ trusted name servers they prefer to use before accepting
189189+ referrals to name servers outside of this "trusted" set.
190190+191191+ - Access to information is more critical than instantaneous
192192+ updates or guarantees of consistency. Hence the update
193193+ process allows updates to percolate out through the users of
194194+ the domain system rather than guaranteeing that all copies are
195195+ simultaneously updated. When updates are unavailable due to
196196+ network or host failure, the usual course is to believe old
197197+ information while continuing efforts to update it. The
198198+ general model is that copies are distributed with timeouts for
199199+ refreshing. The distributor sets the timeout value and the
200200+ recipient of the distribution is responsible for performing
201201+ the refresh. In special situations, very short intervals can
202202+ be specified, or the owner can prohibit copies.
203203+204204+ - In any system that has a distributed database, a particular
205205+ name server may be presented with a query that can only be
206206+ answered by some other server. The two general approaches to
207207+ dealing with this problem are "recursive", in which the first
208208+ server pursues the query for the client at another server, and
209209+ "iterative", in which the server refers the client to another
210210+ server and lets the client pursue the query. Both approaches
211211+ have advantages and disadvantages, but the iterative approach
212212+ is preferred for the datagram style of access. The domain
213213+ system requires implementation of the iterative approach, but
214214+ allows the recursive approach as an option.
215215+216216+217217+218218+219219+220220+Mockapetris [Page 4]
221221+222222+RFC 1034 Domain Concepts and Facilities November 1987
223223+224224+225225+The domain system assumes that all data originates in master files
226226+scattered through the hosts that use the domain system. These master
227227+files are updated by local system administrators. Master files are text
228228+files that are read by a local name server, and hence become available
229229+through the name servers to users of the domain system. The user
230230+programs access name servers through standard programs called resolvers.
231231+232232+The standard format of master files allows them to be exchanged between
233233+hosts (via FTP, mail, or some other mechanism); this facility is useful
234234+when an organization wants a domain, but doesn't want to support a name
235235+server. The organization can maintain the master files locally using a
236236+text editor, transfer them to a foreign host which runs a name server,
237237+and then arrange with the system administrator of the name server to get
238238+the files loaded.
239239+240240+Each host's name servers and resolvers are configured by a local system
241241+administrator [RFC-1033]. For a name server, this configuration data
242242+includes the identity of local master files and instructions on which
243243+non-local master files are to be loaded from foreign servers. The name
244244+server uses the master files or copies to load its zones. For
245245+resolvers, the configuration data identifies the name servers which
246246+should be the primary sources of information.
247247+248248+The domain system defines procedures for accessing the data and for
249249+referrals to other name servers. The domain system also defines
250250+procedures for caching retrieved data and for periodic refreshing of
251251+data defined by the system administrator.
252252+253253+The system administrators provide:
254254+255255+ - The definition of zone boundaries.
256256+257257+ - Master files of data.
258258+259259+ - Updates to master files.
260260+261261+ - Statements of the refresh policies desired.
262262+263263+The domain system provides:
264264+265265+ - Standard formats for resource data.
266266+267267+ - Standard methods for querying the database.
268268+269269+ - Standard methods for name servers to refresh local data from
270270+ foreign name servers.
271271+272272+273273+274274+275275+276276+Mockapetris [Page 5]
277277+278278+RFC 1034 Domain Concepts and Facilities November 1987
279279+280280+281281+2.4. Elements of the DNS
282282+283283+The DNS has three major components:
284284+285285+ - The DOMAIN NAME SPACE and RESOURCE RECORDS, which are
286286+ specifications for a tree structured name space and data
287287+ associated with the names. Conceptually, each node and leaf
288288+ of the domain name space tree names a set of information, and
289289+ query operations are attempts to extract specific types of
290290+ information from a particular set. A query names the domain
291291+ name of interest and describes the type of resource
292292+ information that is desired. For example, the Internet
293293+ uses some of its domain names to identify hosts; queries for
294294+ address resources return Internet host addresses.
295295+296296+ - NAME SERVERS are server programs which hold information about
297297+ the domain tree's structure and set information. A name
298298+ server may cache structure or set information about any part
299299+ of the domain tree, but in general a particular name server
300300+ has complete information about a subset of the domain space,
301301+ and pointers to other name servers that can be used to lead to
302302+ information from any part of the domain tree. Name servers
303303+ know the parts of the domain tree for which they have complete
304304+ information; a name server is said to be an AUTHORITY for
305305+ these parts of the name space. Authoritative information is
306306+ organized into units called ZONEs, and these zones can be
307307+ automatically distributed to the name servers which provide
308308+ redundant service for the data in a zone.
309309+310310+ - RESOLVERS are programs that extract information from name
311311+ servers in response to client requests. Resolvers must be
312312+ able to access at least one name server and use that name
313313+ server's information to answer a query directly, or pursue the
314314+ query using referrals to other name servers. A resolver will
315315+ typically be a system routine that is directly accessible to
316316+ user programs; hence no protocol is necessary between the
317317+ resolver and the user program.
318318+319319+These three components roughly correspond to the three layers or views
320320+of the domain system:
321321+322322+ - From the user's point of view, the domain system is accessed
323323+ through a simple procedure or OS call to a local resolver.
324324+ The domain space consists of a single tree and the user can
325325+ request information from any section of the tree.
326326+327327+ - From the resolver's point of view, the domain system is
328328+ composed of an unknown number of name servers. Each name
329329+330330+331331+332332+Mockapetris [Page 6]
333333+334334+RFC 1034 Domain Concepts and Facilities November 1987
335335+336336+337337+ server has one or more pieces of the whole domain tree's data,
338338+ but the resolver views each of these databases as essentially
339339+ static.
340340+341341+ - From a name server's point of view, the domain system consists
342342+ of separate sets of local information called zones. The name
343343+ server has local copies of some of the zones. The name server
344344+ must periodically refresh its zones from master copies in
345345+ local files or foreign name servers. The name server must
346346+ concurrently process queries that arrive from resolvers.
347347+348348+In the interests of performance, implementations may couple these
349349+functions. For example, a resolver on the same machine as a name server
350350+might share a database consisting of the the zones managed by the name
351351+server and the cache managed by the resolver.
352352+353353+3. DOMAIN NAME SPACE and RESOURCE RECORDS
354354+355355+3.1. Name space specifications and terminology
356356+357357+The domain name space is a tree structure. Each node and leaf on the
358358+tree corresponds to a resource set (which may be empty). The domain
359359+system makes no distinctions between the uses of the interior nodes and
360360+leaves, and this memo uses the term "node" to refer to both.
361361+362362+Each node has a label, which is zero to 63 octets in length. Brother
363363+nodes may not have the same label, although the same label can be used
364364+for nodes which are not brothers. One label is reserved, and that is
365365+the null (i.e., zero length) label used for the root.
366366+367367+The domain name of a node is the list of the labels on the path from the
368368+node to the root of the tree. By convention, the labels that compose a
369369+domain name are printed or read left to right, from the most specific
370370+(lowest, farthest from the root) to the least specific (highest, closest
371371+to the root).
372372+373373+Internally, programs that manipulate domain names should represent them
374374+as sequences of labels, where each label is a length octet followed by
375375+an octet string. Because all domain names end at the root, which has a
376376+null string for a label, these internal representations can use a length
377377+byte of zero to terminate a domain name.
378378+379379+By convention, domain names can be stored with arbitrary case, but
380380+domain name comparisons for all present domain functions are done in a
381381+case-insensitive manner, assuming an ASCII character set, and a high
382382+order zero bit. This means that you are free to create a node with
383383+label "A" or a node with label "a", but not both as brothers; you could
384384+refer to either using "a" or "A". When you receive a domain name or
385385+386386+387387+388388+Mockapetris [Page 7]
389389+390390+RFC 1034 Domain Concepts and Facilities November 1987
391391+392392+393393+label, you should preserve its case. The rationale for this choice is
394394+that we may someday need to add full binary domain names for new
395395+services; existing services would not be changed.
396396+397397+When a user needs to type a domain name, the length of each label is
398398+omitted and the labels are separated by dots ("."). Since a complete
399399+domain name ends with the root label, this leads to a printed form which
400400+ends in a dot. We use this property to distinguish between:
401401+402402+ - a character string which represents a complete domain name
403403+ (often called "absolute"). For example, "poneria.ISI.EDU."
404404+405405+ - a character string that represents the starting labels of a
406406+ domain name which is incomplete, and should be completed by
407407+ local software using knowledge of the local domain (often
408408+ called "relative"). For example, "poneria" used in the
409409+ ISI.EDU domain.
410410+411411+Relative names are either taken relative to a well known origin, or to a
412412+list of domains used as a search list. Relative names appear mostly at
413413+the user interface, where their interpretation varies from
414414+implementation to implementation, and in master files, where they are
415415+relative to a single origin domain name. The most common interpretation
416416+uses the root "." as either the single origin or as one of the members
417417+of the search list, so a multi-label relative name is often one where
418418+the trailing dot has been omitted to save typing.
419419+420420+To simplify implementations, the total number of octets that represent a
421421+domain name (i.e., the sum of all label octets and label lengths) is
422422+limited to 255.
423423+424424+A domain is identified by a domain name, and consists of that part of
425425+the domain name space that is at or below the domain name which
426426+specifies the domain. A domain is a subdomain of another domain if it
427427+is contained within that domain. This relationship can be tested by
428428+seeing if the subdomain's name ends with the containing domain's name.
429429+For example, A.B.C.D is a subdomain of B.C.D, C.D, D, and " ".
430430+431431+3.2. Administrative guidelines on use
432432+433433+As a matter of policy, the DNS technical specifications do not mandate a
434434+particular tree structure or rules for selecting labels; its goal is to
435435+be as general as possible, so that it can be used to build arbitrary
436436+applications. In particular, the system was designed so that the name
437437+space did not have to be organized along the lines of network
438438+boundaries, name servers, etc. The rationale for this is not that the
439439+name space should have no implied semantics, but rather that the choice
440440+of implied semantics should be left open to be used for the problem at
441441+442442+443443+444444+Mockapetris [Page 8]
445445+446446+RFC 1034 Domain Concepts and Facilities November 1987
447447+448448+449449+hand, and that different parts of the tree can have different implied
450450+semantics. For example, the IN-ADDR.ARPA domain is organized and
451451+distributed by network and host address because its role is to translate
452452+from network or host numbers to names; NetBIOS domains [RFC-1001, RFC-
453453+1002] are flat because that is appropriate for that application.
454454+455455+However, there are some guidelines that apply to the "normal" parts of
456456+the name space used for hosts, mailboxes, etc., that will make the name
457457+space more uniform, provide for growth, and minimize problems as
458458+software is converted from the older host table. The political
459459+decisions about the top levels of the tree originated in RFC-920.
460460+Current policy for the top levels is discussed in [RFC-1032]. MILNET
461461+conversion issues are covered in [RFC-1031].
462462+463463+Lower domains which will eventually be broken into multiple zones should
464464+provide branching at the top of the domain so that the eventual
465465+decomposition can be done without renaming. Node labels which use
466466+special characters, leading digits, etc., are likely to break older
467467+software which depends on more restrictive choices.
468468+469469+3.3. Technical guidelines on use
470470+471471+Before the DNS can be used to hold naming information for some kind of
472472+object, two needs must be met:
473473+474474+ - A convention for mapping between object names and domain
475475+ names. This describes how information about an object is
476476+ accessed.
477477+478478+ - RR types and data formats for describing the object.
479479+480480+These rules can be quite simple or fairly complex. Very often, the
481481+designer must take into account existing formats and plan for upward
482482+compatibility for existing usage. Multiple mappings or levels of
483483+mapping may be required.
484484+485485+For hosts, the mapping depends on the existing syntax for host names
486486+which is a subset of the usual text representation for domain names,
487487+together with RR formats for describing host addresses, etc. Because we
488488+need a reliable inverse mapping from address to host name, a special
489489+mapping for addresses into the IN-ADDR.ARPA domain is also defined.
490490+491491+For mailboxes, the mapping is slightly more complex. The usual mail
492492+address <local-part>@<mail-domain> is mapped into a domain name by
493493+converting <local-part> into a single label (regardles of dots it
494494+contains), converting <mail-domain> into a domain name using the usual
495495+text format for domain names (dots denote label breaks), and
496496+concatenating the two to form a single domain name. Thus the mailbox
497497+498498+499499+500500+Mockapetris [Page 9]
501501+502502+RFC 1034 Domain Concepts and Facilities November 1987
503503+504504+505505+HOSTMASTER@SRI-NIC.ARPA is represented as a domain name by
506506+HOSTMASTER.SRI-NIC.ARPA. An appreciation for the reasons behind this
507507+design also must take into account the scheme for mail exchanges [RFC-
508508+974].
509509+510510+The typical user is not concerned with defining these rules, but should
511511+understand that they usually are the result of numerous compromises
512512+between desires for upward compatibility with old usage, interactions
513513+between different object definitions, and the inevitable urge to add new
514514+features when defining the rules. The way the DNS is used to support
515515+some object is often more crucial than the restrictions inherent in the
516516+DNS.
517517+518518+3.4. Example name space
519519+520520+The following figure shows a part of the current domain name space, and
521521+is used in many examples in this RFC. Note that the tree is a very
522522+small subset of the actual name space.
523523+524524+ |
525525+ |
526526+ +---------------------+------------------+
527527+ | | |
528528+ MIL EDU ARPA
529529+ | | |
530530+ | | |
531531+ +-----+-----+ | +------+-----+-----+
532532+ | | | | | | |
533533+ BRL NOSC DARPA | IN-ADDR SRI-NIC ACC
534534+ |
535535+ +--------+------------------+---------------+--------+
536536+ | | | | |
537537+ UCI MIT | UDEL YALE
538538+ | ISI
539539+ | |
540540+ +---+---+ |
541541+ | | |
542542+ LCS ACHILLES +--+-----+-----+--------+
543543+ | | | | | |
544544+ XX A C VAXA VENERA Mockapetris
545545+546546+In this example, the root domain has three immediate subdomains: MIL,
547547+EDU, and ARPA. The LCS.MIT.EDU domain has one immediate subdomain named
548548+XX.LCS.MIT.EDU. All of the leaves are also domains.
549549+550550+3.5. Preferred name syntax
551551+552552+The DNS specifications attempt to be as general as possible in the rules
553553+554554+555555+556556+Mockapetris [Page 10]
557557+558558+RFC 1034 Domain Concepts and Facilities November 1987
559559+560560+561561+for constructing domain names. The idea is that the name of any
562562+existing object can be expressed as a domain name with minimal changes.
563563+However, when assigning a domain name for an object, the prudent user
564564+will select a name which satisfies both the rules of the domain system
565565+and any existing rules for the object, whether these rules are published
566566+or implied by existing programs.
567567+568568+For example, when naming a mail domain, the user should satisfy both the
569569+rules of this memo and those in RFC-822. When creating a new host name,
570570+the old rules for HOSTS.TXT should be followed. This avoids problems
571571+when old software is converted to use domain names.
572572+573573+The following syntax will result in fewer problems with many
574574+applications that use domain names (e.g., mail, TELNET).
575575+576576+<domain> ::= <subdomain> | " "
577577+578578+<subdomain> ::= <label> | <subdomain> "." <label>
579579+580580+<label> ::= <letter> [ [ <ldh-str> ] <let-dig> ]
581581+582582+<ldh-str> ::= <let-dig-hyp> | <let-dig-hyp> <ldh-str>
583583+584584+<let-dig-hyp> ::= <let-dig> | "-"
585585+586586+<let-dig> ::= <letter> | <digit>
587587+588588+<letter> ::= any one of the 52 alphabetic characters A through Z in
589589+upper case and a through z in lower case
590590+591591+<digit> ::= any one of the ten digits 0 through 9
592592+593593+Note that while upper and lower case letters are allowed in domain
594594+names, no significance is attached to the case. That is, two names with
595595+the same spelling but different case are to be treated as if identical.
596596+597597+The labels must follow the rules for ARPANET host names. They must
598598+start with a letter, end with a letter or digit, and have as interior
599599+characters only letters, digits, and hyphen. There are also some
600600+restrictions on the length. Labels must be 63 characters or less.
601601+602602+For example, the following strings identify hosts in the Internet:
603603+604604+A.ISI.EDU XX.LCS.MIT.EDU SRI-NIC.ARPA
605605+606606+3.6. Resource Records
607607+608608+A domain name identifies a node. Each node has a set of resource
609609+610610+611611+612612+Mockapetris [Page 11]
613613+614614+RFC 1034 Domain Concepts and Facilities November 1987
615615+616616+617617+information, which may be empty. The set of resource information
618618+associated with a particular name is composed of separate resource
619619+records (RRs). The order of RRs in a set is not significant, and need
620620+not be preserved by name servers, resolvers, or other parts of the DNS.
621621+622622+When we talk about a specific RR, we assume it has the following:
623623+624624+owner which is the domain name where the RR is found.
625625+626626+type which is an encoded 16 bit value that specifies the type
627627+ of the resource in this resource record. Types refer to
628628+ abstract resources.
629629+630630+ This memo uses the following types:
631631+632632+ A a host address
633633+634634+ CNAME identifies the canonical name of an
635635+ alias
636636+637637+ HINFO identifies the CPU and OS used by a host
638638+639639+ MX identifies a mail exchange for the
640640+ domain. See [RFC-974 for details.
641641+642642+ NS
643643+ the authoritative name server for the domain
644644+645645+ PTR
646646+ a pointer to another part of the domain name space
647647+648648+ SOA
649649+ identifies the start of a zone of authority]
650650+651651+class which is an encoded 16 bit value which identifies a
652652+ protocol family or instance of a protocol.
653653+654654+ This memo uses the following classes:
655655+656656+ IN the Internet system
657657+658658+ CH the Chaos system
659659+660660+TTL which is the time to live of the RR. This field is a 32
661661+ bit integer in units of seconds, an is primarily used by
662662+ resolvers when they cache RRs. The TTL describes how
663663+ long a RR can be cached before it should be discarded.
664664+665665+666666+667667+668668+Mockapetris [Page 12]
669669+670670+RFC 1034 Domain Concepts and Facilities November 1987
671671+672672+673673+RDATA which is the type and sometimes class dependent data
674674+ which describes the resource:
675675+676676+ A For the IN class, a 32 bit IP address
677677+678678+ For the CH class, a domain name followed
679679+ by a 16 bit octal Chaos address.
680680+681681+ CNAME a domain name.
682682+683683+ MX a 16 bit preference value (lower is
684684+ better) followed by a host name willing
685685+ to act as a mail exchange for the owner
686686+ domain.
687687+688688+ NS a host name.
689689+690690+ PTR a domain name.
691691+692692+ SOA several fields.
693693+694694+The owner name is often implicit, rather than forming an integral part
695695+of the RR. For example, many name servers internally form tree or hash
696696+structures for the name space, and chain RRs off nodes. The remaining
697697+RR parts are the fixed header (type, class, TTL) which is consistent for
698698+all RRs, and a variable part (RDATA) that fits the needs of the resource
699699+being described.
700700+701701+The meaning of the TTL field is a time limit on how long an RR can be
702702+kept in a cache. This limit does not apply to authoritative data in
703703+zones; it is also timed out, but by the refreshing policies for the
704704+zone. The TTL is assigned by the administrator for the zone where the
705705+data originates. While short TTLs can be used to minimize caching, and
706706+a zero TTL prohibits caching, the realities of Internet performance
707707+suggest that these times should be on the order of days for the typical
708708+host. If a change can be anticipated, the TTL can be reduced prior to
709709+the change to minimize inconsistency during the change, and then
710710+increased back to its former value following the change.
711711+712712+The data in the RDATA section of RRs is carried as a combination of
713713+binary strings and domain names. The domain names are frequently used
714714+as "pointers" to other data in the DNS.
715715+716716+3.6.1. Textual expression of RRs
717717+718718+RRs are represented in binary form in the packets of the DNS protocol,
719719+and are usually represented in highly encoded form when stored in a name
720720+server or resolver. In this memo, we adopt a style similar to that used
721721+722722+723723+724724+Mockapetris [Page 13]
725725+726726+RFC 1034 Domain Concepts and Facilities November 1987
727727+728728+729729+in master files in order to show the contents of RRs. In this format,
730730+most RRs are shown on a single line, although continuation lines are
731731+possible using parentheses.
732732+733733+The start of the line gives the owner of the RR. If a line begins with
734734+a blank, then the owner is assumed to be the same as that of the
735735+previous RR. Blank lines are often included for readability.
736736+737737+Following the owner, we list the TTL, type, and class of the RR. Class
738738+and type use the mnemonics defined above, and TTL is an integer before
739739+the type field. In order to avoid ambiguity in parsing, type and class
740740+mnemonics are disjoint, TTLs are integers, and the type mnemonic is
741741+always last. The IN class and TTL values are often omitted from examples
742742+in the interests of clarity.
743743+744744+The resource data or RDATA section of the RR are given using knowledge
745745+of the typical representation for the data.
746746+747747+For example, we might show the RRs carried in a message as:
748748+749749+ ISI.EDU. MX 10 VENERA.ISI.EDU.
750750+ MX 10 VAXA.ISI.EDU.
751751+ VENERA.ISI.EDU. A 128.9.0.32
752752+ A 10.1.0.52
753753+ VAXA.ISI.EDU. A 10.2.0.27
754754+ A 128.9.0.33
755755+756756+The MX RRs have an RDATA section which consists of a 16 bit number
757757+followed by a domain name. The address RRs use a standard IP address
758758+format to contain a 32 bit internet address.
759759+760760+This example shows six RRs, with two RRs at each of three domain names.
761761+762762+Similarly we might see:
763763+764764+ XX.LCS.MIT.EDU. IN A 10.0.0.44
765765+ CH A MIT.EDU. 2420
766766+767767+This example shows two addresses for XX.LCS.MIT.EDU, each of a different
768768+class.
769769+770770+3.6.2. Aliases and canonical names
771771+772772+In existing systems, hosts and other resources often have several names
773773+that identify the same resource. For example, the names C.ISI.EDU and
774774+USC-ISIC.ARPA both identify the same host. Similarly, in the case of
775775+mailboxes, many organizations provide many names that actually go to the
776776+same mailbox; for example Mockapetris@C.ISI.EDU, Mockapetris@B.ISI.EDU,
777777+778778+779779+780780+Mockapetris [Page 14]
781781+782782+RFC 1034 Domain Concepts and Facilities November 1987
783783+784784+785785+and PVM@ISI.EDU all go to the same mailbox (although the mechanism
786786+behind this is somewhat complicated).
787787+788788+Most of these systems have a notion that one of the equivalent set of
789789+names is the canonical or primary name and all others are aliases.
790790+791791+The domain system provides such a feature using the canonical name
792792+(CNAME) RR. A CNAME RR identifies its owner name as an alias, and
793793+specifies the corresponding canonical name in the RDATA section of the
794794+RR. If a CNAME RR is present at a node, no other data should be
795795+present; this ensures that the data for a canonical name and its aliases
796796+cannot be different. This rule also insures that a cached CNAME can be
797797+used without checking with an authoritative server for other RR types.
798798+799799+CNAME RRs cause special action in DNS software. When a name server
800800+fails to find a desired RR in the resource set associated with the
801801+domain name, it checks to see if the resource set consists of a CNAME
802802+record with a matching class. If so, the name server includes the CNAME
803803+record in the response and restarts the query at the domain name
804804+specified in the data field of the CNAME record. The one exception to
805805+this rule is that queries which match the CNAME type are not restarted.
806806+807807+For example, suppose a name server was processing a query with for USC-
808808+ISIC.ARPA, asking for type A information, and had the following resource
809809+records:
810810+811811+ USC-ISIC.ARPA IN CNAME C.ISI.EDU
812812+813813+ C.ISI.EDU IN A 10.0.0.52
814814+815815+Both of these RRs would be returned in the response to the type A query,
816816+while a type CNAME or * query should return just the CNAME.
817817+818818+Domain names in RRs which point at another name should always point at
819819+the primary name and not the alias. This avoids extra indirections in
820820+accessing information. For example, the address to name RR for the
821821+above host should be:
822822+823823+ 52.0.0.10.IN-ADDR.ARPA IN PTR C.ISI.EDU
824824+825825+rather than pointing at USC-ISIC.ARPA. Of course, by the robustness
826826+principle, domain software should not fail when presented with CNAME
827827+chains or loops; CNAME chains should be followed and CNAME loops
828828+signalled as an error.
829829+830830+3.7. Queries
831831+832832+Queries are messages which may be sent to a name server to provoke a
833833+834834+835835+836836+Mockapetris [Page 15]
837837+838838+RFC 1034 Domain Concepts and Facilities November 1987
839839+840840+841841+response. In the Internet, queries are carried in UDP datagrams or over
842842+TCP connections. The response by the name server either answers the
843843+question posed in the query, refers the requester to another set of name
844844+servers, or signals some error condition.
845845+846846+In general, the user does not generate queries directly, but instead
847847+makes a request to a resolver which in turn sends one or more queries to
848848+name servers and deals with the error conditions and referrals that may
849849+result. Of course, the possible questions which can be asked in a query
850850+does shape the kind of service a resolver can provide.
851851+852852+DNS queries and responses are carried in a standard message format. The
853853+message format has a header containing a number of fixed fields which
854854+are always present, and four sections which carry query parameters and
855855+RRs.
856856+857857+The most important field in the header is a four bit field called an
858858+opcode which separates different queries. Of the possible 16 values,
859859+one (standard query) is part of the official protocol, two (inverse
860860+query and status query) are options, one (completion) is obsolete, and
861861+the rest are unassigned.
862862+863863+The four sections are:
864864+865865+Question Carries the query name and other query parameters.
866866+867867+Answer Carries RRs which directly answer the query.
868868+869869+Authority Carries RRs which describe other authoritative servers.
870870+ May optionally carry the SOA RR for the authoritative
871871+ data in the answer section.
872872+873873+Additional Carries RRs which may be helpful in using the RRs in the
874874+ other sections.
875875+876876+Note that the content, but not the format, of these sections varies with
877877+header opcode.
878878+879879+3.7.1. Standard queries
880880+881881+A standard query specifies a target domain name (QNAME), query type
882882+(QTYPE), and query class (QCLASS) and asks for RRs which match. This
883883+type of query makes up such a vast majority of DNS queries that we use
884884+the term "query" to mean standard query unless otherwise specified. The
885885+QTYPE and QCLASS fields are each 16 bits long, and are a superset of
886886+defined types and classes.
887887+888888+889889+890890+891891+892892+Mockapetris [Page 16]
893893+894894+RFC 1034 Domain Concepts and Facilities November 1987
895895+896896+897897+The QTYPE field may contain:
898898+899899+<any type> matches just that type. (e.g., A, PTR).
900900+901901+AXFR special zone transfer QTYPE.
902902+903903+MAILB matches all mail box related RRs (e.g. MB and MG).
904904+905905+* matches all RR types.
906906+907907+The QCLASS field may contain:
908908+909909+<any class> matches just that class (e.g., IN, CH).
910910+911911+* matches aLL RR classes.
912912+913913+Using the query domain name, QTYPE, and QCLASS, the name server looks
914914+for matching RRs. In addition to relevant records, the name server may
915915+return RRs that point toward a name server that has the desired
916916+information or RRs that are expected to be useful in interpreting the
917917+relevant RRs. For example, a name server that doesn't have the
918918+requested information may know a name server that does; a name server
919919+that returns a domain name in a relevant RR may also return the RR that
920920+binds that domain name to an address.
921921+922922+For example, a mailer tying to send mail to Mockapetris@ISI.EDU might
923923+ask the resolver for mail information about ISI.EDU, resulting in a
924924+query for QNAME=ISI.EDU, QTYPE=MX, QCLASS=IN. The response's answer
925925+section would be:
926926+927927+ ISI.EDU. MX 10 VENERA.ISI.EDU.
928928+ MX 10 VAXA.ISI.EDU.
929929+930930+while the additional section might be:
931931+932932+ VAXA.ISI.EDU. A 10.2.0.27
933933+ A 128.9.0.33
934934+ VENERA.ISI.EDU. A 10.1.0.52
935935+ A 128.9.0.32
936936+937937+Because the server assumes that if the requester wants mail exchange
938938+information, it will probably want the addresses of the mail exchanges
939939+soon afterward.
940940+941941+Note that the QCLASS=* construct requires special interpretation
942942+regarding authority. Since a particular name server may not know all of
943943+the classes available in the domain system, it can never know if it is
944944+authoritative for all classes. Hence responses to QCLASS=* queries can
945945+946946+947947+948948+Mockapetris [Page 17]
949949+950950+RFC 1034 Domain Concepts and Facilities November 1987
951951+952952+953953+never be authoritative.
954954+955955+3.7.2. Inverse queries (Optional)
956956+957957+Name servers may also support inverse queries that map a particular
958958+resource to a domain name or domain names that have that resource. For
959959+example, while a standard query might map a domain name to a SOA RR, the
960960+corresponding inverse query might map the SOA RR back to the domain
961961+name.
962962+963963+Implementation of this service is optional in a name server, but all
964964+name servers must at least be able to understand an inverse query
965965+message and return a not-implemented error response.
966966+967967+The domain system cannot guarantee the completeness or uniqueness of
968968+inverse queries because the domain system is organized by domain name
969969+rather than by host address or any other resource type. Inverse queries
970970+are primarily useful for debugging and database maintenance activities.
971971+972972+Inverse queries may not return the proper TTL, and do not indicate cases
973973+where the identified RR is one of a set (for example, one address for a
974974+host having multiple addresses). Therefore, the RRs returned in inverse
975975+queries should never be cached.
976976+977977+Inverse queries are NOT an acceptable method for mapping host addresses
978978+to host names; use the IN-ADDR.ARPA domain instead.
979979+980980+A detailed discussion of inverse queries is contained in [RFC-1035].
981981+982982+3.8. Status queries (Experimental)
983983+984984+To be defined.
985985+986986+3.9. Completion queries (Obsolete)
987987+988988+The optional completion services described in RFCs 882 and 883 have been
989989+deleted. Redesigned services may become available in the future, or the
990990+opcodes may be reclaimed for other use.
991991+992992+4. NAME SERVERS
993993+994994+4.1. Introduction
995995+996996+Name servers are the repositories of information that make up the domain
997997+database. The database is divided up into sections called zones, which
998998+are distributed among the name servers. While name servers can have
999999+several optional functions and sources of data, the essential task of a
10001000+name server is to answer queries using data in its zones. By design,
10011001+10021002+10031003+10041004+Mockapetris [Page 18]
10051005+10061006+RFC 1034 Domain Concepts and Facilities November 1987
10071007+10081008+10091009+name servers can answer queries in a simple manner; the response can
10101010+always be generated using only local data, and either contains the
10111011+answer to the question or a referral to other name servers "closer" to
10121012+the desired information.
10131013+10141014+A given zone will be available from several name servers to insure its
10151015+availability in spite of host or communication link failure. By
10161016+administrative fiat, we require every zone to be available on at least
10171017+two servers, and many zones have more redundancy than that.
10181018+10191019+A given name server will typically support one or more zones, but this
10201020+gives it authoritative information about only a small section of the
10211021+domain tree. It may also have some cached non-authoritative data about
10221022+other parts of the tree. The name server marks its responses to queries
10231023+so that the requester can tell whether the response comes from
10241024+authoritative data or not.
10251025+10261026+4.2. How the database is divided into zones
10271027+10281028+The domain database is partitioned in two ways: by class, and by "cuts"
10291029+made in the name space between nodes.
10301030+10311031+The class partition is simple. The database for any class is organized,
10321032+delegated, and maintained separately from all other classes. Since, by
10331033+convention, the name spaces are the same for all classes, the separate
10341034+classes can be thought of as an array of parallel namespace trees. Note
10351035+that the data attached to nodes will be different for these different
10361036+parallel classes. The most common reasons for creating a new class are
10371037+the necessity for a new data format for existing types or a desire for a
10381038+separately managed version of the existing name space.
10391039+10401040+Within a class, "cuts" in the name space can be made between any two
10411041+adjacent nodes. After all cuts are made, each group of connected name
10421042+space is a separate zone. The zone is said to be authoritative for all
10431043+names in the connected region. Note that the "cuts" in the name space
10441044+may be in different places for different classes, the name servers may
10451045+be different, etc.
10461046+10471047+These rules mean that every zone has at least one node, and hence domain
10481048+name, for which it is authoritative, and all of the nodes in a
10491049+particular zone are connected. Given, the tree structure, every zone
10501050+has a highest node which is closer to the root than any other node in
10511051+the zone. The name of this node is often used to identify the zone.
10521052+10531053+It would be possible, though not particularly useful, to partition the
10541054+name space so that each domain name was in a separate zone or so that
10551055+all nodes were in a single zone. Instead, the database is partitioned
10561056+at points where a particular organization wants to take over control of
10571057+10581058+10591059+10601060+Mockapetris [Page 19]
10611061+10621062+RFC 1034 Domain Concepts and Facilities November 1987
10631063+10641064+10651065+a subtree. Once an organization controls its own zone it can
10661066+unilaterally change the data in the zone, grow new tree sections
10671067+connected to the zone, delete existing nodes, or delegate new subzones
10681068+under its zone.
10691069+10701070+If the organization has substructure, it may want to make further
10711071+internal partitions to achieve nested delegations of name space control.
10721072+In some cases, such divisions are made purely to make database
10731073+maintenance more convenient.
10741074+10751075+4.2.1. Technical considerations
10761076+10771077+The data that describes a zone has four major parts:
10781078+10791079+ - Authoritative data for all nodes within the zone.
10801080+10811081+ - Data that defines the top node of the zone (can be thought of
10821082+ as part of the authoritative data).
10831083+10841084+ - Data that describes delegated subzones, i.e., cuts around the
10851085+ bottom of the zone.
10861086+10871087+ - Data that allows access to name servers for subzones
10881088+ (sometimes called "glue" data).
10891089+10901090+All of this data is expressed in the form of RRs, so a zone can be
10911091+completely described in terms of a set of RRs. Whole zones can be
10921092+transferred between name servers by transferring the RRs, either carried
10931093+in a series of messages or by FTPing a master file which is a textual
10941094+representation.
10951095+10961096+The authoritative data for a zone is simply all of the RRs attached to
10971097+all of the nodes from the top node of the zone down to leaf nodes or
10981098+nodes above cuts around the bottom edge of the zone.
10991099+11001100+Though logically part of the authoritative data, the RRs that describe
11011101+the top node of the zone are especially important to the zone's
11021102+management. These RRs are of two types: name server RRs that list, one
11031103+per RR, all of the servers for the zone, and a single SOA RR that
11041104+describes zone management parameters.
11051105+11061106+The RRs that describe cuts around the bottom of the zone are NS RRs that
11071107+name the servers for the subzones. Since the cuts are between nodes,
11081108+these RRs are NOT part of the authoritative data of the zone, and should
11091109+be exactly the same as the corresponding RRs in the top node of the
11101110+subzone. Since name servers are always associated with zone boundaries,
11111111+NS RRs are only found at nodes which are the top node of some zone. In
11121112+the data that makes up a zone, NS RRs are found at the top node of the
11131113+11141114+11151115+11161116+Mockapetris [Page 20]
11171117+11181118+RFC 1034 Domain Concepts and Facilities November 1987
11191119+11201120+11211121+zone (and are authoritative) and at cuts around the bottom of the zone
11221122+(where they are not authoritative), but never in between.
11231123+11241124+One of the goals of the zone structure is that any zone have all the
11251125+data required to set up communications with the name servers for any
11261126+subzones. That is, parent zones have all the information needed to
11271127+access servers for their children zones. The NS RRs that name the
11281128+servers for subzones are often not enough for this task since they name
11291129+the servers, but do not give their addresses. In particular, if the
11301130+name of the name server is itself in the subzone, we could be faced with
11311131+the situation where the NS RRs tell us that in order to learn a name
11321132+server's address, we should contact the server using the address we wish
11331133+to learn. To fix this problem, a zone contains "glue" RRs which are not
11341134+part of the authoritative data, and are address RRs for the servers.
11351135+These RRs are only necessary if the name server's name is "below" the
11361136+cut, and are only used as part of a referral response.
11371137+11381138+4.2.2. Administrative considerations
11391139+11401140+When some organization wants to control its own domain, the first step
11411141+is to identify the proper parent zone, and get the parent zone's owners
11421142+to agree to the delegation of control. While there are no particular
11431143+technical constraints dealing with where in the tree this can be done,
11441144+there are some administrative groupings discussed in [RFC-1032] which
11451145+deal with top level organization, and middle level zones are free to
11461146+create their own rules. For example, one university might choose to use
11471147+a single zone, while another might choose to organize by subzones
11481148+dedicated to individual departments or schools. [RFC-1033] catalogs
11491149+available DNS software an discusses administration procedures.
11501150+11511151+Once the proper name for the new subzone is selected, the new owners
11521152+should be required to demonstrate redundant name server support. Note
11531153+that there is no requirement that the servers for a zone reside in a
11541154+host which has a name in that domain. In many cases, a zone will be
11551155+more accessible to the internet at large if its servers are widely
11561156+distributed rather than being within the physical facilities controlled
11571157+by the same organization that manages the zone. For example, in the
11581158+current DNS, one of the name servers for the United Kingdom, or UK
11591159+domain, is found in the US. This allows US hosts to get UK data without
11601160+using limited transatlantic bandwidth.
11611161+11621162+As the last installation step, the delegation NS RRs and glue RRs
11631163+necessary to make the delegation effective should be added to the parent
11641164+zone. The administrators of both zones should insure that the NS and
11651165+glue RRs which mark both sides of the cut are consistent and remain so.
11661166+11671167+4.3. Name server internals
11681168+11691169+11701170+11711171+11721172+Mockapetris [Page 21]
11731173+11741174+RFC 1034 Domain Concepts and Facilities November 1987
11751175+11761176+11771177+4.3.1. Queries and responses
11781178+11791179+The principal activity of name servers is to answer standard queries.
11801180+Both the query and its response are carried in a standard message format
11811181+which is described in [RFC-1035]. The query contains a QTYPE, QCLASS,
11821182+and QNAME, which describe the types and classes of desired information
11831183+and the name of interest.
11841184+11851185+The way that the name server answers the query depends upon whether it
11861186+is operating in recursive mode or not:
11871187+11881188+ - The simplest mode for the server is non-recursive, since it
11891189+ can answer queries using only local information: the response
11901190+ contains an error, the answer, or a referral to some other
11911191+ server "closer" to the answer. All name servers must
11921192+ implement non-recursive queries.
11931193+11941194+ - The simplest mode for the client is recursive, since in this
11951195+ mode the name server acts in the role of a resolver and
11961196+ returns either an error or the answer, but never referrals.
11971197+ This service is optional in a name server, and the name server
11981198+ may also choose to restrict the clients which can use
11991199+ recursive mode.
12001200+12011201+Recursive service is helpful in several situations:
12021202+12031203+ - a relatively simple requester that lacks the ability to use
12041204+ anything other than a direct answer to the question.
12051205+12061206+ - a request that needs to cross protocol or other boundaries and
12071207+ can be sent to a server which can act as intermediary.
12081208+12091209+ - a network where we want to concentrate the cache rather than
12101210+ having a separate cache for each client.
12111211+12121212+Non-recursive service is appropriate if the requester is capable of
12131213+pursuing referrals and interested in information which will aid future
12141214+requests.
12151215+12161216+The use of recursive mode is limited to cases where both the client and
12171217+the name server agree to its use. The agreement is negotiated through
12181218+the use of two bits in query and response messages:
12191219+12201220+ - The recursion available, or RA bit, is set or cleared by a
12211221+ name server in all responses. The bit is true if the name
12221222+ server is willing to provide recursive service for the client,
12231223+ regardless of whether the client requested recursive service.
12241224+ That is, RA signals availability rather than use.
12251225+12261226+12271227+12281228+Mockapetris [Page 22]
12291229+12301230+RFC 1034 Domain Concepts and Facilities November 1987
12311231+12321232+12331233+ - Queries contain a bit called recursion desired or RD. This
12341234+ bit specifies specifies whether the requester wants recursive
12351235+ service for this query. Clients may request recursive service
12361236+ from any name server, though they should depend upon receiving
12371237+ it only from servers which have previously sent an RA, or
12381238+ servers which have agreed to provide service through private
12391239+ agreement or some other means outside of the DNS protocol.
12401240+12411241+The recursive mode occurs when a query with RD set arrives at a server
12421242+which is willing to provide recursive service; the client can verify
12431243+that recursive mode was used by checking that both RA and RD are set in
12441244+the reply. Note that the name server should never perform recursive
12451245+service unless asked via RD, since this interferes with trouble shooting
12461246+of name servers and their databases.
12471247+12481248+If recursive service is requested and available, the recursive response
12491249+to a query will be one of the following:
12501250+12511251+ - The answer to the query, possibly preface by one or more CNAME
12521252+ RRs that specify aliases encountered on the way to an answer.
12531253+12541254+ - A name error indicating that the name does not exist. This
12551255+ may include CNAME RRs that indicate that the original query
12561256+ name was an alias for a name which does not exist.
12571257+12581258+ - A temporary error indication.
12591259+12601260+If recursive service is not requested or is not available, the non-
12611261+recursive response will be one of the following:
12621262+12631263+ - An authoritative name error indicating that the name does not
12641264+ exist.
12651265+12661266+ - A temporary error indication.
12671267+12681268+ - Some combination of:
12691269+12701270+ RRs that answer the question, together with an indication
12711271+ whether the data comes from a zone or is cached.
12721272+12731273+ A referral to name servers which have zones which are closer
12741274+ ancestors to the name than the server sending the reply.
12751275+12761276+ - RRs that the name server thinks will prove useful to the
12771277+ requester.
12781278+12791279+12801280+12811281+12821282+12831283+12841284+Mockapetris [Page 23]
12851285+12861286+RFC 1034 Domain Concepts and Facilities November 1987
12871287+12881288+12891289+4.3.2. Algorithm
12901290+12911291+The actual algorithm used by the name server will depend on the local OS
12921292+and data structures used to store RRs. The following algorithm assumes
12931293+that the RRs are organized in several tree structures, one for each
12941294+zone, and another for the cache:
12951295+12961296+ 1. Set or clear the value of recursion available in the response
12971297+ depending on whether the name server is willing to provide
12981298+ recursive service. If recursive service is available and
12991299+ requested via the RD bit in the query, go to step 5,
13001300+ otherwise step 2.
13011301+13021302+ 2. Search the available zones for the zone which is the nearest
13031303+ ancestor to QNAME. If such a zone is found, go to step 3,
13041304+ otherwise step 4.
13051305+13061306+ 3. Start matching down, label by label, in the zone. The
13071307+ matching process can terminate several ways:
13081308+13091309+ a. If the whole of QNAME is matched, we have found the
13101310+ node.
13111311+13121312+ If the data at the node is a CNAME, and QTYPE doesn't
13131313+ match CNAME, copy the CNAME RR into the answer section
13141314+ of the response, change QNAME to the canonical name in
13151315+ the CNAME RR, and go back to step 1.
13161316+13171317+ Otherwise, copy all RRs which match QTYPE into the
13181318+ answer section and go to step 6.
13191319+13201320+ b. If a match would take us out of the authoritative data,
13211321+ we have a referral. This happens when we encounter a
13221322+ node with NS RRs marking cuts along the bottom of a
13231323+ zone.
13241324+13251325+ Copy the NS RRs for the subzone into the authority
13261326+ section of the reply. Put whatever addresses are
13271327+ available into the additional section, using glue RRs
13281328+ if the addresses are not available from authoritative
13291329+ data or the cache. Go to step 4.
13301330+13311331+ c. If at some label, a match is impossible (i.e., the
13321332+ corresponding label does not exist), look to see if a
13331333+ the "*" label exists.
13341334+13351335+ If the "*" label does not exist, check whether the name
13361336+ we are looking for is the original QNAME in the query
13371337+13381338+13391339+13401340+Mockapetris [Page 24]
13411341+13421342+RFC 1034 Domain Concepts and Facilities November 1987
13431343+13441344+13451345+ or a name we have followed due to a CNAME. If the name
13461346+ is original, set an authoritative name error in the
13471347+ response and exit. Otherwise just exit.
13481348+13491349+ If the "*" label does exist, match RRs at that node
13501350+ against QTYPE. If any match, copy them into the answer
13511351+ section, but set the owner of the RR to be QNAME, and
13521352+ not the node with the "*" label. Go to step 6.
13531353+13541354+ 4. Start matching down in the cache. If QNAME is found in the
13551355+ cache, copy all RRs attached to it that match QTYPE into the
13561356+ answer section. If there was no delegation from
13571357+ authoritative data, look for the best one from the cache, and
13581358+ put it in the authority section. Go to step 6.
13591359+13601360+ 5. Using the local resolver or a copy of its algorithm (see
13611361+ resolver section of this memo) to answer the query. Store
13621362+ the results, including any intermediate CNAMEs, in the answer
13631363+ section of the response.
13641364+13651365+ 6. Using local data only, attempt to add other RRs which may be
13661366+ useful to the additional section of the query. Exit.
13671367+13681368+4.3.3. Wildcards
13691369+13701370+In the previous algorithm, special treatment was given to RRs with owner
13711371+names starting with the label "*". Such RRs are called wildcards.
13721372+Wildcard RRs can be thought of as instructions for synthesizing RRs.
13731373+When the appropriate conditions are met, the name server creates RRs
13741374+with an owner name equal to the query name and contents taken from the
13751375+wildcard RRs.
13761376+13771377+This facility is most often used to create a zone which will be used to
13781378+forward mail from the Internet to some other mail system. The general
13791379+idea is that any name in that zone which is presented to server in a
13801380+query will be assumed to exist, with certain properties, unless explicit
13811381+evidence exists to the contrary. Note that the use of the term zone
13821382+here, instead of domain, is intentional; such defaults do not propagate
13831383+across zone boundaries, although a subzone may choose to achieve that
13841384+appearance by setting up similar defaults.
13851385+13861386+The contents of the wildcard RRs follows the usual rules and formats for
13871387+RRs. The wildcards in the zone have an owner name that controls the
13881388+query names they will match. The owner name of the wildcard RRs is of
13891389+the form "*.<anydomain>", where <anydomain> is any domain name.
13901390+<anydomain> should not contain other * labels, and should be in the
13911391+authoritative data of the zone. The wildcards potentially apply to
13921392+descendants of <anydomain>, but not to <anydomain> itself. Another way
13931393+13941394+13951395+13961396+Mockapetris [Page 25]
13971397+13981398+RFC 1034 Domain Concepts and Facilities November 1987
13991399+14001400+14011401+to look at this is that the "*" label always matches at least one whole
14021402+label and sometimes more, but always whole labels.
14031403+14041404+Wildcard RRs do not apply:
14051405+14061406+ - When the query is in another zone. That is, delegation cancels
14071407+ the wildcard defaults.
14081408+14091409+ - When the query name or a name between the wildcard domain and
14101410+ the query name is know to exist. For example, if a wildcard
14111411+ RR has an owner name of "*.X", and the zone also contains RRs
14121412+ attached to B.X, the wildcards would apply to queries for name
14131413+ Z.X (presuming there is no explicit information for Z.X), but
14141414+ not to B.X, A.B.X, or X.
14151415+14161416+A * label appearing in a query name has no special effect, but can be
14171417+used to test for wildcards in an authoritative zone; such a query is the
14181418+only way to get a response containing RRs with an owner name with * in
14191419+it. The result of such a query should not be cached.
14201420+14211421+Note that the contents of the wildcard RRs are not modified when used to
14221422+synthesize RRs.
14231423+14241424+To illustrate the use of wildcard RRs, suppose a large company with a
14251425+large, non-IP/TCP, network wanted to create a mail gateway. If the
14261426+company was called X.COM, and IP/TCP capable gateway machine was called
14271427+A.X.COM, the following RRs might be entered into the COM zone:
14281428+14291429+ X.COM MX 10 A.X.COM
14301430+14311431+ *.X.COM MX 10 A.X.COM
14321432+14331433+ A.X.COM A 1.2.3.4
14341434+ A.X.COM MX 10 A.X.COM
14351435+14361436+ *.A.X.COM MX 10 A.X.COM
14371437+14381438+This would cause any MX query for any domain name ending in X.COM to
14391439+return an MX RR pointing at A.X.COM. Two wildcard RRs are required
14401440+since the effect of the wildcard at *.X.COM is inhibited in the A.X.COM
14411441+subtree by the explicit data for A.X.COM. Note also that the explicit
14421442+MX data at X.COM and A.X.COM is required, and that none of the RRs above
14431443+would match a query name of XX.COM.
14441444+14451445+4.3.4. Negative response caching (Optional)
14461446+14471447+The DNS provides an optional service which allows name servers to
14481448+distribute, and resolvers to cache, negative results with TTLs. For
14491449+14501450+14511451+14521452+Mockapetris [Page 26]
14531453+14541454+RFC 1034 Domain Concepts and Facilities November 1987
14551455+14561456+14571457+example, a name server can distribute a TTL along with a name error
14581458+indication, and a resolver receiving such information is allowed to
14591459+assume that the name does not exist during the TTL period without
14601460+consulting authoritative data. Similarly, a resolver can make a query
14611461+with a QTYPE which matches multiple types, and cache the fact that some
14621462+of the types are not present.
14631463+14641464+This feature can be particularly important in a system which implements
14651465+naming shorthands that use search lists beacuse a popular shorthand,
14661466+which happens to require a suffix toward the end of the search list,
14671467+will generate multiple name errors whenever it is used.
14681468+14691469+The method is that a name server may add an SOA RR to the additional
14701470+section of a response when that response is authoritative. The SOA must
14711471+be that of the zone which was the source of the authoritative data in
14721472+the answer section, or name error if applicable. The MINIMUM field of
14731473+the SOA controls the length of time that the negative result may be
14741474+cached.
14751475+14761476+Note that in some circumstances, the answer section may contain multiple
14771477+owner names. In this case, the SOA mechanism should only be used for
14781478+the data which matches QNAME, which is the only authoritative data in
14791479+this section.
14801480+14811481+Name servers and resolvers should never attempt to add SOAs to the
14821482+additional section of a non-authoritative response, or attempt to infer
14831483+results which are not directly stated in an authoritative response.
14841484+There are several reasons for this, including: cached information isn't
14851485+usually enough to match up RRs and their zone names, SOA RRs may be
14861486+cached due to direct SOA queries, and name servers are not required to
14871487+output the SOAs in the authority section.
14881488+14891489+This feature is optional, although a refined version is expected to
14901490+become part of the standard protocol in the future. Name servers are
14911491+not required to add the SOA RRs in all authoritative responses, nor are
14921492+resolvers required to cache negative results. Both are recommended.
14931493+All resolvers and recursive name servers are required to at least be
14941494+able to ignore the SOA RR when it is present in a response.
14951495+14961496+Some experiments have also been proposed which will use this feature.
14971497+The idea is that if cached data is known to come from a particular zone,
14981498+and if an authoritative copy of the zone's SOA is obtained, and if the
14991499+zone's SERIAL has not changed since the data was cached, then the TTL of
15001500+the cached data can be reset to the zone MINIMUM value if it is smaller.
15011501+This usage is mentioned for planning purposes only, and is not
15021502+recommended as yet.
15031503+15041504+15051505+15061506+15071507+15081508+Mockapetris [Page 27]
15091509+15101510+RFC 1034 Domain Concepts and Facilities November 1987
15111511+15121512+15131513+4.3.5. Zone maintenance and transfers
15141514+15151515+Part of the job of a zone administrator is to maintain the zones at all
15161516+of the name servers which are authoritative for the zone. When the
15171517+inevitable changes are made, they must be distributed to all of the name
15181518+servers. While this distribution can be accomplished using FTP or some
15191519+other ad hoc procedure, the preferred method is the zone transfer part
15201520+of the DNS protocol.
15211521+15221522+The general model of automatic zone transfer or refreshing is that one
15231523+of the name servers is the master or primary for the zone. Changes are
15241524+coordinated at the primary, typically by editing a master file for the
15251525+zone. After editing, the administrator signals the master server to
15261526+load the new zone. The other non-master or secondary servers for the
15271527+zone periodically check for changes (at a selectable interval) and
15281528+obtain new zone copies when changes have been made.
15291529+15301530+To detect changes, secondaries just check the SERIAL field of the SOA
15311531+for the zone. In addition to whatever other changes are made, the
15321532+SERIAL field in the SOA of the zone is always advanced whenever any
15331533+change is made to the zone. The advancing can be a simple increment, or
15341534+could be based on the write date and time of the master file, etc. The
15351535+purpose is to make it possible to determine which of two copies of a
15361536+zone is more recent by comparing serial numbers. Serial number advances
15371537+and comparisons use sequence space arithmetic, so there is a theoretic
15381538+limit on how fast a zone can be updated, basically that old copies must
15391539+die out before the serial number covers half of its 32 bit range. In
15401540+practice, the only concern is that the compare operation deals properly
15411541+with comparisons around the boundary between the most positive and most
15421542+negative 32 bit numbers.
15431543+15441544+The periodic polling of the secondary servers is controlled by
15451545+parameters in the SOA RR for the zone, which set the minimum acceptable
15461546+polling intervals. The parameters are called REFRESH, RETRY, and
15471547+EXPIRE. Whenever a new zone is loaded in a secondary, the secondary
15481548+waits REFRESH seconds before checking with the primary for a new serial.
15491549+If this check cannot be completed, new checks are started every RETRY
15501550+seconds. The check is a simple query to the primary for the SOA RR of
15511551+the zone. If the serial field in the secondary's zone copy is equal to
15521552+the serial returned by the primary, then no changes have occurred, and
15531553+the REFRESH interval wait is restarted. If the secondary finds it
15541554+impossible to perform a serial check for the EXPIRE interval, it must
15551555+assume that its copy of the zone is obsolete an discard it.
15561556+15571557+When the poll shows that the zone has changed, then the secondary server
15581558+must request a zone transfer via an AXFR request for the zone. The AXFR
15591559+may cause an error, such as refused, but normally is answered by a
15601560+sequence of response messages. The first and last messages must contain
15611561+15621562+15631563+15641564+Mockapetris [Page 28]
15651565+15661566+RFC 1034 Domain Concepts and Facilities November 1987
15671567+15681568+15691569+the data for the top authoritative node of the zone. Intermediate
15701570+messages carry all of the other RRs from the zone, including both
15711571+authoritative and non-authoritative RRs. The stream of messages allows
15721572+the secondary to construct a copy of the zone. Because accuracy is
15731573+essential, TCP or some other reliable protocol must be used for AXFR
15741574+requests.
15751575+15761576+Each secondary server is required to perform the following operations
15771577+against the master, but may also optionally perform these operations
15781578+against other secondary servers. This strategy can improve the transfer
15791579+process when the primary is unavailable due to host downtime or network
15801580+problems, or when a secondary server has better network access to an
15811581+"intermediate" secondary than to the primary.
15821582+15831583+5. RESOLVERS
15841584+15851585+5.1. Introduction
15861586+15871587+Resolvers are programs that interface user programs to domain name
15881588+servers. In the simplest case, a resolver receives a request from a
15891589+user program (e.g., mail programs, TELNET, FTP) in the form of a
15901590+subroutine call, system call etc., and returns the desired information
15911591+in a form compatible with the local host's data formats.
15921592+15931593+The resolver is located on the same machine as the program that requests
15941594+the resolver's services, but it may need to consult name servers on
15951595+other hosts. Because a resolver may need to consult several name
15961596+servers, or may have the requested information in a local cache, the
15971597+amount of time that a resolver will take to complete can vary quite a
15981598+bit, from milliseconds to several seconds.
15991599+16001600+A very important goal of the resolver is to eliminate network delay and
16011601+name server load from most requests by answering them from its cache of
16021602+prior results. It follows that caches which are shared by multiple
16031603+processes, users, machines, etc., are more efficient than non-shared
16041604+caches.
16051605+16061606+5.2. Client-resolver interface
16071607+16081608+5.2.1. Typical functions
16091609+16101610+The client interface to the resolver is influenced by the local host's
16111611+conventions, but the typical resolver-client interface has three
16121612+functions:
16131613+16141614+ 1. Host name to host address translation.
16151615+16161616+ This function is often defined to mimic a previous HOSTS.TXT
16171617+16181618+16191619+16201620+Mockapetris [Page 29]
16211621+16221622+RFC 1034 Domain Concepts and Facilities November 1987
16231623+16241624+16251625+ based function. Given a character string, the caller wants
16261626+ one or more 32 bit IP addresses. Under the DNS, it
16271627+ translates into a request for type A RRs. Since the DNS does
16281628+ not preserve the order of RRs, this function may choose to
16291629+ sort the returned addresses or select the "best" address if
16301630+ the service returns only one choice to the client. Note that
16311631+ a multiple address return is recommended, but a single
16321632+ address may be the only way to emulate prior HOSTS.TXT
16331633+ services.
16341634+16351635+ 2. Host address to host name translation
16361636+16371637+ This function will often follow the form of previous
16381638+ functions. Given a 32 bit IP address, the caller wants a
16391639+ character string. The octets of the IP address are reversed,
16401640+ used as name components, and suffixed with "IN-ADDR.ARPA". A
16411641+ type PTR query is used to get the RR with the primary name of
16421642+ the host. For example, a request for the host name
16431643+ corresponding to IP address 1.2.3.4 looks for PTR RRs for
16441644+ domain name "4.3.2.1.IN-ADDR.ARPA".
16451645+16461646+ 3. General lookup function
16471647+16481648+ This function retrieves arbitrary information from the DNS,
16491649+ and has no counterpart in previous systems. The caller
16501650+ supplies a QNAME, QTYPE, and QCLASS, and wants all of the
16511651+ matching RRs. This function will often use the DNS format
16521652+ for all RR data instead of the local host's, and returns all
16531653+ RR content (e.g., TTL) instead of a processed form with local
16541654+ quoting conventions.
16551655+16561656+When the resolver performs the indicated function, it usually has one of
16571657+the following results to pass back to the client:
16581658+16591659+ - One or more RRs giving the requested data.
16601660+16611661+ In this case the resolver returns the answer in the
16621662+ appropriate format.
16631663+16641664+ - A name error (NE).
16651665+16661666+ This happens when the referenced name does not exist. For
16671667+ example, a user may have mistyped a host name.
16681668+16691669+ - A data not found error.
16701670+16711671+ This happens when the referenced name exists, but data of the
16721672+ appropriate type does not. For example, a host address
16731673+16741674+16751675+16761676+Mockapetris [Page 30]
16771677+16781678+RFC 1034 Domain Concepts and Facilities November 1987
16791679+16801680+16811681+ function applied to a mailbox name would return this error
16821682+ since the name exists, but no address RR is present.
16831683+16841684+It is important to note that the functions for translating between host
16851685+names and addresses may combine the "name error" and "data not found"
16861686+error conditions into a single type of error return, but the general
16871687+function should not. One reason for this is that applications may ask
16881688+first for one type of information about a name followed by a second
16891689+request to the same name for some other type of information; if the two
16901690+errors are combined, then useless queries may slow the application.
16911691+16921692+5.2.2. Aliases
16931693+16941694+While attempting to resolve a particular request, the resolver may find
16951695+that the name in question is an alias. For example, the resolver might
16961696+find that the name given for host name to address translation is an
16971697+alias when it finds the CNAME RR. If possible, the alias condition
16981698+should be signalled back from the resolver to the client.
16991699+17001700+In most cases a resolver simply restarts the query at the new name when
17011701+it encounters a CNAME. However, when performing the general function,
17021702+the resolver should not pursue aliases when the CNAME RR matches the
17031703+query type. This allows queries which ask whether an alias is present.
17041704+For example, if the query type is CNAME, the user is interested in the
17051705+CNAME RR itself, and not the RRs at the name it points to.
17061706+17071707+Several special conditions can occur with aliases. Multiple levels of
17081708+aliases should be avoided due to their lack of efficiency, but should
17091709+not be signalled as an error. Alias loops and aliases which point to
17101710+non-existent names should be caught and an error condition passed back
17111711+to the client.
17121712+17131713+5.2.3. Temporary failures
17141714+17151715+In a less than perfect world, all resolvers will occasionally be unable
17161716+to resolve a particular request. This condition can be caused by a
17171717+resolver which becomes separated from the rest of the network due to a
17181718+link failure or gateway problem, or less often by coincident failure or
17191719+unavailability of all servers for a particular domain.
17201720+17211721+It is essential that this sort of condition should not be signalled as a
17221722+name or data not present error to applications. This sort of behavior
17231723+is annoying to humans, and can wreak havoc when mail systems use the
17241724+DNS.
17251725+17261726+While in some cases it is possible to deal with such a temporary problem
17271727+by blocking the request indefinitely, this is usually not a good choice,
17281728+particularly when the client is a server process that could move on to
17291729+17301730+17311731+17321732+Mockapetris [Page 31]
17331733+17341734+RFC 1034 Domain Concepts and Facilities November 1987
17351735+17361736+17371737+other tasks. The recommended solution is to always have temporary
17381738+failure as one of the possible results of a resolver function, even
17391739+though this may make emulation of existing HOSTS.TXT functions more
17401740+difficult.
17411741+17421742+5.3. Resolver internals
17431743+17441744+Every resolver implementation uses slightly different algorithms, and
17451745+typically spends much more logic dealing with errors of various sorts
17461746+than typical occurances. This section outlines a recommended basic
17471747+strategy for resolver operation, but leaves details to [RFC-1035].
17481748+17491749+5.3.1. Stub resolvers
17501750+17511751+One option for implementing a resolver is to move the resolution
17521752+function out of the local machine and into a name server which supports
17531753+recursive queries. This can provide an easy method of providing domain
17541754+service in a PC which lacks the resources to perform the resolver
17551755+function, or can centralize the cache for a whole local network or
17561756+organization.
17571757+17581758+All that the remaining stub needs is a list of name server addresses
17591759+that will perform the recursive requests. This type of resolver
17601760+presumably needs the information in a configuration file, since it
17611761+probably lacks the sophistication to locate it in the domain database.
17621762+The user also needs to verify that the listed servers will perform the
17631763+recursive service; a name server is free to refuse to perform recursive
17641764+services for any or all clients. The user should consult the local
17651765+system administrator to find name servers willing to perform the
17661766+service.
17671767+17681768+This type of service suffers from some drawbacks. Since the recursive
17691769+requests may take an arbitrary amount of time to perform, the stub may
17701770+have difficulty optimizing retransmission intervals to deal with both
17711771+lost UDP packets and dead servers; the name server can be easily
17721772+overloaded by too zealous a stub if it interprets retransmissions as new
17731773+requests. Use of TCP may be an answer, but TCP may well place burdens
17741774+on the host's capabilities which are similar to those of a real
17751775+resolver.
17761776+17771777+5.3.2. Resources
17781778+17791779+In addition to its own resources, the resolver may also have shared
17801780+access to zones maintained by a local name server. This gives the
17811781+resolver the advantage of more rapid access, but the resolver must be
17821782+careful to never let cached information override zone data. In this
17831783+discussion the term "local information" is meant to mean the union of
17841784+the cache and such shared zones, with the understanding that
17851785+17861786+17871787+17881788+Mockapetris [Page 32]
17891789+17901790+RFC 1034 Domain Concepts and Facilities November 1987
17911791+17921792+17931793+authoritative data is always used in preference to cached data when both
17941794+are present.
17951795+17961796+The following resolver algorithm assumes that all functions have been
17971797+converted to a general lookup function, and uses the following data
17981798+structures to represent the state of a request in progress in the
17991799+resolver:
18001800+18011801+SNAME the domain name we are searching for.
18021802+18031803+STYPE the QTYPE of the search request.
18041804+18051805+SCLASS the QCLASS of the search request.
18061806+18071807+SLIST a structure which describes the name servers and the
18081808+ zone which the resolver is currently trying to query.
18091809+ This structure keeps track of the resolver's current
18101810+ best guess about which name servers hold the desired
18111811+ information; it is updated when arriving information
18121812+ changes the guess. This structure includes the
18131813+ equivalent of a zone name, the known name servers for
18141814+ the zone, the known addresses for the name servers, and
18151815+ history information which can be used to suggest which
18161816+ server is likely to be the best one to try next. The
18171817+ zone name equivalent is a match count of the number of
18181818+ labels from the root down which SNAME has in common with
18191819+ the zone being queried; this is used as a measure of how
18201820+ "close" the resolver is to SNAME.
18211821+18221822+SBELT a "safety belt" structure of the same form as SLIST,
18231823+ which is initialized from a configuration file, and
18241824+ lists servers which should be used when the resolver
18251825+ doesn't have any local information to guide name server
18261826+ selection. The match count will be -1 to indicate that
18271827+ no labels are known to match.
18281828+18291829+CACHE A structure which stores the results from previous
18301830+ responses. Since resolvers are responsible for
18311831+ discarding old RRs whose TTL has expired, most
18321832+ implementations convert the interval specified in
18331833+ arriving RRs to some sort of absolute time when the RR
18341834+ is stored in the cache. Instead of counting the TTLs
18351835+ down individually, the resolver just ignores or discards
18361836+ old RRs when it runs across them in the course of a
18371837+ search, or discards them during periodic sweeps to
18381838+ reclaim the memory consumed by old RRs.
18391839+18401840+18411841+18421842+18431843+18441844+Mockapetris [Page 33]
18451845+18461846+RFC 1034 Domain Concepts and Facilities November 1987
18471847+18481848+18491849+5.3.3. Algorithm
18501850+18511851+The top level algorithm has four steps:
18521852+18531853+ 1. See if the answer is in local information, and if so return
18541854+ it to the client.
18551855+18561856+ 2. Find the best servers to ask.
18571857+18581858+ 3. Send them queries until one returns a response.
18591859+18601860+ 4. Analyze the response, either:
18611861+18621862+ a. if the response answers the question or contains a name
18631863+ error, cache the data as well as returning it back to
18641864+ the client.
18651865+18661866+ b. if the response contains a better delegation to other
18671867+ servers, cache the delegation information, and go to
18681868+ step 2.
18691869+18701870+ c. if the response shows a CNAME and that is not the
18711871+ answer itself, cache the CNAME, change the SNAME to the
18721872+ canonical name in the CNAME RR and go to step 1.
18731873+18741874+ d. if the response shows a servers failure or other
18751875+ bizarre contents, delete the server from the SLIST and
18761876+ go back to step 3.
18771877+18781878+Step 1 searches the cache for the desired data. If the data is in the
18791879+cache, it is assumed to be good enough for normal use. Some resolvers
18801880+have an option at the user interface which will force the resolver to
18811881+ignore the cached data and consult with an authoritative server. This
18821882+is not recommended as the default. If the resolver has direct access to
18831883+a name server's zones, it should check to see if the desired data is
18841884+present in authoritative form, and if so, use the authoritative data in
18851885+preference to cached data.
18861886+18871887+Step 2 looks for a name server to ask for the required data. The
18881888+general strategy is to look for locally-available name server RRs,
18891889+starting at SNAME, then the parent domain name of SNAME, the
18901890+grandparent, and so on toward the root. Thus if SNAME were
18911891+Mockapetris.ISI.EDU, this step would look for NS RRs for
18921892+Mockapetris.ISI.EDU, then ISI.EDU, then EDU, and then . (the root).
18931893+These NS RRs list the names of hosts for a zone at or above SNAME. Copy
18941894+the names into SLIST. Set up their addresses using local data. It may
18951895+be the case that the addresses are not available. The resolver has many
18961896+choices here; the best is to start parallel resolver processes looking
18971897+18981898+18991899+19001900+Mockapetris [Page 34]
19011901+19021902+RFC 1034 Domain Concepts and Facilities November 1987
19031903+19041904+19051905+for the addresses while continuing onward with the addresses which are
19061906+available. Obviously, the design choices and options are complicated
19071907+and a function of the local host's capabilities. The recommended
19081908+priorities for the resolver designer are:
19091909+19101910+ 1. Bound the amount of work (packets sent, parallel processes
19111911+ started) so that a request can't get into an infinite loop or
19121912+ start off a chain reaction of requests or queries with other
19131913+ implementations EVEN IF SOMEONE HAS INCORRECTLY CONFIGURED
19141914+ SOME DATA.
19151915+19161916+ 2. Get back an answer if at all possible.
19171917+19181918+ 3. Avoid unnecessary transmissions.
19191919+19201920+ 4. Get the answer as quickly as possible.
19211921+19221922+If the search for NS RRs fails, then the resolver initializes SLIST from
19231923+the safety belt SBELT. The basic idea is that when the resolver has no
19241924+idea what servers to ask, it should use information from a configuration
19251925+file that lists several servers which are expected to be helpful.
19261926+Although there are special situations, the usual choice is two of the
19271927+root servers and two of the servers for the host's domain. The reason
19281928+for two of each is for redundancy. The root servers will provide
19291929+eventual access to all of the domain space. The two local servers will
19301930+allow the resolver to continue to resolve local names if the local
19311931+network becomes isolated from the internet due to gateway or link
19321932+failure.
19331933+19341934+In addition to the names and addresses of the servers, the SLIST data
19351935+structure can be sorted to use the best servers first, and to insure
19361936+that all addresses of all servers are used in a round-robin manner. The
19371937+sorting can be a simple function of preferring addresses on the local
19381938+network over others, or may involve statistics from past events, such as
19391939+previous response times and batting averages.
19401940+19411941+Step 3 sends out queries until a response is received. The strategy is
19421942+to cycle around all of the addresses for all of the servers with a
19431943+timeout between each transmission. In practice it is important to use
19441944+all addresses of a multihomed host, and too aggressive a retransmission
19451945+policy actually slows response when used by multiple resolvers
19461946+contending for the same name server and even occasionally for a single
19471947+resolver. SLIST typically contains data values to control the timeouts
19481948+and keep track of previous transmissions.
19491949+19501950+Step 4 involves analyzing responses. The resolver should be highly
19511951+paranoid in its parsing of responses. It should also check that the
19521952+response matches the query it sent using the ID field in the response.
19531953+19541954+19551955+19561956+Mockapetris [Page 35]
19571957+19581958+RFC 1034 Domain Concepts and Facilities November 1987
19591959+19601960+19611961+The ideal answer is one from a server authoritative for the query which
19621962+either gives the required data or a name error. The data is passed back
19631963+to the user and entered in the cache for future use if its TTL is
19641964+greater than zero.
19651965+19661966+If the response shows a delegation, the resolver should check to see
19671967+that the delegation is "closer" to the answer than the servers in SLIST
19681968+are. This can be done by comparing the match count in SLIST with that
19691969+computed from SNAME and the NS RRs in the delegation. If not, the reply
19701970+is bogus and should be ignored. If the delegation is valid the NS
19711971+delegation RRs and any address RRs for the servers should be cached.
19721972+The name servers are entered in the SLIST, and the search is restarted.
19731973+19741974+If the response contains a CNAME, the search is restarted at the CNAME
19751975+unless the response has the data for the canonical name or if the CNAME
19761976+is the answer itself.
19771977+19781978+Details and implementation hints can be found in [RFC-1035].
19791979+19801980+6. A SCENARIO
19811981+19821982+In our sample domain space, suppose we wanted separate administrative
19831983+control for the root, MIL, EDU, MIT.EDU and ISI.EDU zones. We might
19841984+allocate name servers as follows:
19851985+19861986+19871987+ |(C.ISI.EDU,SRI-NIC.ARPA
19881988+ | A.ISI.EDU)
19891989+ +---------------------+------------------+
19901990+ | | |
19911991+ MIL EDU ARPA
19921992+ |(SRI-NIC.ARPA, |(SRI-NIC.ARPA, |
19931993+ | A.ISI.EDU | C.ISI.EDU) |
19941994+ +-----+-----+ | +------+-----+-----+
19951995+ | | | | | | |
19961996+ BRL NOSC DARPA | IN-ADDR SRI-NIC ACC
19971997+ |
19981998+ +--------+------------------+---------------+--------+
19991999+ | | | | |
20002000+ UCI MIT | UDEL YALE
20012001+ |(XX.LCS.MIT.EDU, ISI
20022002+ |ACHILLES.MIT.EDU) |(VAXA.ISI.EDU,VENERA.ISI.EDU,
20032003+ +---+---+ | A.ISI.EDU)
20042004+ | | |
20052005+ LCS ACHILLES +--+-----+-----+--------+
20062006+ | | | | | |
20072007+ XX A C VAXA VENERA Mockapetris
20082008+20092009+20102010+20112011+20122012+Mockapetris [Page 36]
20132013+20142014+RFC 1034 Domain Concepts and Facilities November 1987
20152015+20162016+20172017+In this example, the authoritative name server is shown in parentheses
20182018+at the point in the domain tree at which is assumes control.
20192019+20202020+Thus the root name servers are on C.ISI.EDU, SRI-NIC.ARPA, and
20212021+A.ISI.EDU. The MIL domain is served by SRI-NIC.ARPA and A.ISI.EDU. The
20222022+EDU domain is served by SRI-NIC.ARPA. and C.ISI.EDU. Note that servers
20232023+may have zones which are contiguous or disjoint. In this scenario,
20242024+C.ISI.EDU has contiguous zones at the root and EDU domains. A.ISI.EDU
20252025+has contiguous zones at the root and MIL domains, but also has a non-
20262026+contiguous zone at ISI.EDU.
20272027+20282028+6.1. C.ISI.EDU name server
20292029+20302030+C.ISI.EDU is a name server for the root, MIL, and EDU domains of the IN
20312031+class, and would have zones for these domains. The zone data for the
20322032+root domain might be:
20332033+20342034+ . IN SOA SRI-NIC.ARPA. HOSTMASTER.SRI-NIC.ARPA. (
20352035+ 870611 ;serial
20362036+ 1800 ;refresh every 30 min
20372037+ 300 ;retry every 5 min
20382038+ 604800 ;expire after a week
20392039+ 86400) ;minimum of a day
20402040+ NS A.ISI.EDU.
20412041+ NS C.ISI.EDU.
20422042+ NS SRI-NIC.ARPA.
20432043+20442044+ MIL. 86400 NS SRI-NIC.ARPA.
20452045+ 86400 NS A.ISI.EDU.
20462046+20472047+ EDU. 86400 NS SRI-NIC.ARPA.
20482048+ 86400 NS C.ISI.EDU.
20492049+20502050+ SRI-NIC.ARPA. A 26.0.0.73
20512051+ A 10.0.0.51
20522052+ MX 0 SRI-NIC.ARPA.
20532053+ HINFO DEC-2060 TOPS20
20542054+20552055+ ACC.ARPA. A 26.6.0.65
20562056+ HINFO PDP-11/70 UNIX
20572057+ MX 10 ACC.ARPA.
20582058+20592059+ USC-ISIC.ARPA. CNAME C.ISI.EDU.
20602060+20612061+ 73.0.0.26.IN-ADDR.ARPA. PTR SRI-NIC.ARPA.
20622062+ 65.0.6.26.IN-ADDR.ARPA. PTR ACC.ARPA.
20632063+ 51.0.0.10.IN-ADDR.ARPA. PTR SRI-NIC.ARPA.
20642064+ 52.0.0.10.IN-ADDR.ARPA. PTR C.ISI.EDU.
20652065+20662066+20672067+20682068+Mockapetris [Page 37]
20692069+20702070+RFC 1034 Domain Concepts and Facilities November 1987
20712071+20722072+20732073+ 103.0.3.26.IN-ADDR.ARPA. PTR A.ISI.EDU.
20742074+20752075+ A.ISI.EDU. 86400 A 26.3.0.103
20762076+ C.ISI.EDU. 86400 A 10.0.0.52
20772077+20782078+This data is represented as it would be in a master file. Most RRs are
20792079+single line entries; the sole exception here is the SOA RR, which uses
20802080+"(" to start a multi-line RR and ")" to show the end of a multi-line RR.
20812081+Since the class of all RRs in a zone must be the same, only the first RR
20822082+in a zone need specify the class. When a name server loads a zone, it
20832083+forces the TTL of all authoritative RRs to be at least the MINIMUM field
20842084+of the SOA, here 86400 seconds, or one day. The NS RRs marking
20852085+delegation of the MIL and EDU domains, together with the glue RRs for
20862086+the servers host addresses, are not part of the authoritative data in
20872087+the zone, and hence have explicit TTLs.
20882088+20892089+Four RRs are attached to the root node: the SOA which describes the root
20902090+zone and the 3 NS RRs which list the name servers for the root. The
20912091+data in the SOA RR describes the management of the zone. The zone data
20922092+is maintained on host SRI-NIC.ARPA, and the responsible party for the
20932093+zone is HOSTMASTER@SRI-NIC.ARPA. A key item in the SOA is the 86400
20942094+second minimum TTL, which means that all authoritative data in the zone
20952095+has at least that TTL, although higher values may be explicitly
20962096+specified.
20972097+20982098+The NS RRs for the MIL and EDU domains mark the boundary between the
20992099+root zone and the MIL and EDU zones. Note that in this example, the
21002100+lower zones happen to be supported by name servers which also support
21012101+the root zone.
21022102+21032103+The master file for the EDU zone might be stated relative to the origin
21042104+EDU. The zone data for the EDU domain might be:
21052105+21062106+ EDU. IN SOA SRI-NIC.ARPA. HOSTMASTER.SRI-NIC.ARPA. (
21072107+ 870729 ;serial
21082108+ 1800 ;refresh every 30 minutes
21092109+ 300 ;retry every 5 minutes
21102110+ 604800 ;expire after a week
21112111+ 86400 ;minimum of a day
21122112+ )
21132113+ NS SRI-NIC.ARPA.
21142114+ NS C.ISI.EDU.
21152115+21162116+ UCI 172800 NS ICS.UCI
21172117+ 172800 NS ROME.UCI
21182118+ ICS.UCI 172800 A 192.5.19.1
21192119+ ROME.UCI 172800 A 192.5.19.31
21202120+21212121+21222122+21232123+21242124+Mockapetris [Page 38]
21252125+21262126+RFC 1034 Domain Concepts and Facilities November 1987
21272127+21282128+21292129+ ISI 172800 NS VAXA.ISI
21302130+ 172800 NS A.ISI
21312131+ 172800 NS VENERA.ISI.EDU.
21322132+ VAXA.ISI 172800 A 10.2.0.27
21332133+ 172800 A 128.9.0.33
21342134+ VENERA.ISI.EDU. 172800 A 10.1.0.52
21352135+ 172800 A 128.9.0.32
21362136+ A.ISI 172800 A 26.3.0.103
21372137+21382138+ UDEL.EDU. 172800 NS LOUIE.UDEL.EDU.
21392139+ 172800 NS UMN-REI-UC.ARPA.
21402140+ LOUIE.UDEL.EDU. 172800 A 10.0.0.96
21412141+ 172800 A 192.5.39.3
21422142+21432143+ YALE.EDU. 172800 NS YALE.ARPA.
21442144+ YALE.EDU. 172800 NS YALE-BULLDOG.ARPA.
21452145+21462146+ MIT.EDU. 43200 NS XX.LCS.MIT.EDU.
21472147+ 43200 NS ACHILLES.MIT.EDU.
21482148+ XX.LCS.MIT.EDU. 43200 A 10.0.0.44
21492149+ ACHILLES.MIT.EDU. 43200 A 18.72.0.8
21502150+21512151+Note the use of relative names here. The owner name for the ISI.EDU. is
21522152+stated using a relative name, as are two of the name server RR contents.
21532153+Relative and absolute domain names may be freely intermixed in a master
21542154+21552155+6.2. Example standard queries
21562156+21572157+The following queries and responses illustrate name server behavior.
21582158+Unless otherwise noted, the queries do not have recursion desired (RD)
21592159+in the header. Note that the answers to non-recursive queries do depend
21602160+on the server being asked, but do not depend on the identity of the
21612161+requester.
21622162+21632163+21642164+21652165+21662166+21672167+21682168+21692169+21702170+21712171+21722172+21732173+21742174+21752175+21762176+21772177+21782178+21792179+21802180+Mockapetris [Page 39]
21812181+21822182+RFC 1034 Domain Concepts and Facilities November 1987
21832183+21842184+21852185+6.2.1. QNAME=SRI-NIC.ARPA, QTYPE=A
21862186+21872187+The query would look like:
21882188+21892189+ +---------------------------------------------------+
21902190+ Header | OPCODE=SQUERY |
21912191+ +---------------------------------------------------+
21922192+ Question | QNAME=SRI-NIC.ARPA., QCLASS=IN, QTYPE=A |
21932193+ +---------------------------------------------------+
21942194+ Answer | <empty> |
21952195+ +---------------------------------------------------+
21962196+ Authority | <empty> |
21972197+ +---------------------------------------------------+
21982198+ Additional | <empty> |
21992199+ +---------------------------------------------------+
22002200+22012201+The response from C.ISI.EDU would be:
22022202+22032203+ +---------------------------------------------------+
22042204+ Header | OPCODE=SQUERY, RESPONSE, AA |
22052205+ +---------------------------------------------------+
22062206+ Question | QNAME=SRI-NIC.ARPA., QCLASS=IN, QTYPE=A |
22072207+ +---------------------------------------------------+
22082208+ Answer | SRI-NIC.ARPA. 86400 IN A 26.0.0.73 |
22092209+ | 86400 IN A 10.0.0.51 |
22102210+ +---------------------------------------------------+
22112211+ Authority | <empty> |
22122212+ +---------------------------------------------------+
22132213+ Additional | <empty> |
22142214+ +---------------------------------------------------+
22152215+22162216+The header of the response looks like the header of the query, except
22172217+that the RESPONSE bit is set, indicating that this message is a
22182218+response, not a query, and the Authoritative Answer (AA) bit is set
22192219+indicating that the address RRs in the answer section are from
22202220+authoritative data. The question section of the response matches the
22212221+question section of the query.
22222222+22232223+22242224+22252225+22262226+22272227+22282228+22292229+22302230+22312231+22322232+22332233+22342234+22352235+22362236+Mockapetris [Page 40]
22372237+22382238+RFC 1034 Domain Concepts and Facilities November 1987
22392239+22402240+22412241+If the same query was sent to some other server which was not
22422242+authoritative for SRI-NIC.ARPA, the response might be:
22432243+22442244+ +---------------------------------------------------+
22452245+ Header | OPCODE=SQUERY,RESPONSE |
22462246+ +---------------------------------------------------+
22472247+ Question | QNAME=SRI-NIC.ARPA., QCLASS=IN, QTYPE=A |
22482248+ +---------------------------------------------------+
22492249+ Answer | SRI-NIC.ARPA. 1777 IN A 10.0.0.51 |
22502250+ | 1777 IN A 26.0.0.73 |
22512251+ +---------------------------------------------------+
22522252+ Authority | <empty> |
22532253+ +---------------------------------------------------+
22542254+ Additional | <empty> |
22552255+ +---------------------------------------------------+
22562256+22572257+This response is different from the previous one in two ways: the header
22582258+does not have AA set, and the TTLs are different. The inference is that
22592259+the data did not come from a zone, but from a cache. The difference
22602260+between the authoritative TTL and the TTL here is due to aging of the
22612261+data in a cache. The difference in ordering of the RRs in the answer
22622262+section is not significant.
22632263+22642264+6.2.2. QNAME=SRI-NIC.ARPA, QTYPE=*
22652265+22662266+A query similar to the previous one, but using a QTYPE of *, would
22672267+receive the following response from C.ISI.EDU:
22682268+22692269+ +---------------------------------------------------+
22702270+ Header | OPCODE=SQUERY, RESPONSE, AA |
22712271+ +---------------------------------------------------+
22722272+ Question | QNAME=SRI-NIC.ARPA., QCLASS=IN, QTYPE=* |
22732273+ +---------------------------------------------------+
22742274+ Answer | SRI-NIC.ARPA. 86400 IN A 26.0.0.73 |
22752275+ | A 10.0.0.51 |
22762276+ | MX 0 SRI-NIC.ARPA. |
22772277+ | HINFO DEC-2060 TOPS20 |
22782278+ +---------------------------------------------------+
22792279+ Authority | <empty> |
22802280+ +---------------------------------------------------+
22812281+ Additional | <empty> |
22822282+ +---------------------------------------------------+
22832283+22842284+22852285+22862286+22872287+22882288+22892289+22902290+22912291+22922292+Mockapetris [Page 41]
22932293+22942294+RFC 1034 Domain Concepts and Facilities November 1987
22952295+22962296+22972297+If a similar query was directed to two name servers which are not
22982298+authoritative for SRI-NIC.ARPA, the responses might be:
22992299+23002300+ +---------------------------------------------------+
23012301+ Header | OPCODE=SQUERY, RESPONSE |
23022302+ +---------------------------------------------------+
23032303+ Question | QNAME=SRI-NIC.ARPA., QCLASS=IN, QTYPE=* |
23042304+ +---------------------------------------------------+
23052305+ Answer | SRI-NIC.ARPA. 12345 IN A 26.0.0.73 |
23062306+ | A 10.0.0.51 |
23072307+ +---------------------------------------------------+
23082308+ Authority | <empty> |
23092309+ +---------------------------------------------------+
23102310+ Additional | <empty> |
23112311+ +---------------------------------------------------+
23122312+23132313+and
23142314+23152315+ +---------------------------------------------------+
23162316+ Header | OPCODE=SQUERY, RESPONSE |
23172317+ +---------------------------------------------------+
23182318+ Question | QNAME=SRI-NIC.ARPA., QCLASS=IN, QTYPE=* |
23192319+ +---------------------------------------------------+
23202320+ Answer | SRI-NIC.ARPA. 1290 IN HINFO DEC-2060 TOPS20 |
23212321+ +---------------------------------------------------+
23222322+ Authority | <empty> |
23232323+ +---------------------------------------------------+
23242324+ Additional | <empty> |
23252325+ +---------------------------------------------------+
23262326+23272327+Neither of these answers have AA set, so neither response comes from
23282328+authoritative data. The different contents and different TTLs suggest
23292329+that the two servers cached data at different times, and that the first
23302330+server cached the response to a QTYPE=A query and the second cached the
23312331+response to a HINFO query.
23322332+23332333+23342334+23352335+23362336+23372337+23382338+23392339+23402340+23412341+23422342+23432343+23442344+23452345+23462346+23472347+23482348+Mockapetris [Page 42]
23492349+23502350+RFC 1034 Domain Concepts and Facilities November 1987
23512351+23522352+23532353+6.2.3. QNAME=SRI-NIC.ARPA, QTYPE=MX
23542354+23552355+This type of query might be result from a mailer trying to look up
23562356+routing information for the mail destination HOSTMASTER@SRI-NIC.ARPA.
23572357+The response from C.ISI.EDU would be:
23582358+23592359+ +---------------------------------------------------+
23602360+ Header | OPCODE=SQUERY, RESPONSE, AA |
23612361+ +---------------------------------------------------+
23622362+ Question | QNAME=SRI-NIC.ARPA., QCLASS=IN, QTYPE=MX |
23632363+ +---------------------------------------------------+
23642364+ Answer | SRI-NIC.ARPA. 86400 IN MX 0 SRI-NIC.ARPA.|
23652365+ +---------------------------------------------------+
23662366+ Authority | <empty> |
23672367+ +---------------------------------------------------+
23682368+ Additional | SRI-NIC.ARPA. 86400 IN A 26.0.0.73 |
23692369+ | A 10.0.0.51 |
23702370+ +---------------------------------------------------+
23712371+23722372+This response contains the MX RR in the answer section of the response.
23732373+The additional section contains the address RRs because the name server
23742374+at C.ISI.EDU guesses that the requester will need the addresses in order
23752375+to properly use the information carried by the MX.
23762376+23772377+6.2.4. QNAME=SRI-NIC.ARPA, QTYPE=NS
23782378+23792379+C.ISI.EDU would reply to this query with:
23802380+23812381+ +---------------------------------------------------+
23822382+ Header | OPCODE=SQUERY, RESPONSE, AA |
23832383+ +---------------------------------------------------+
23842384+ Question | QNAME=SRI-NIC.ARPA., QCLASS=IN, QTYPE=NS |
23852385+ +---------------------------------------------------+
23862386+ Answer | <empty> |
23872387+ +---------------------------------------------------+
23882388+ Authority | <empty> |
23892389+ +---------------------------------------------------+
23902390+ Additional | <empty> |
23912391+ +---------------------------------------------------+
23922392+23932393+The only difference between the response and the query is the AA and
23942394+RESPONSE bits in the header. The interpretation of this response is
23952395+that the server is authoritative for the name, and the name exists, but
23962396+no RRs of type NS are present there.
23972397+23982398+6.2.5. QNAME=SIR-NIC.ARPA, QTYPE=A
23992399+24002400+If a user mistyped a host name, we might see this type of query.
24012401+24022402+24032403+24042404+Mockapetris [Page 43]
24052405+24062406+RFC 1034 Domain Concepts and Facilities November 1987
24072407+24082408+24092409+C.ISI.EDU would answer it with:
24102410+24112411+ +---------------------------------------------------+
24122412+ Header | OPCODE=SQUERY, RESPONSE, AA, RCODE=NE |
24132413+ +---------------------------------------------------+
24142414+ Question | QNAME=SIR-NIC.ARPA., QCLASS=IN, QTYPE=A |
24152415+ +---------------------------------------------------+
24162416+ Answer | <empty> |
24172417+ +---------------------------------------------------+
24182418+ Authority | . SOA SRI-NIC.ARPA. HOSTMASTER.SRI-NIC.ARPA. |
24192419+ | 870611 1800 300 604800 86400 |
24202420+ +---------------------------------------------------+
24212421+ Additional | <empty> |
24222422+ +---------------------------------------------------+
24232423+24242424+This response states that the name does not exist. This condition is
24252425+signalled in the response code (RCODE) section of the header.
24262426+24272427+The SOA RR in the authority section is the optional negative caching
24282428+information which allows the resolver using this response to assume that
24292429+the name will not exist for the SOA MINIMUM (86400) seconds.
24302430+24312431+6.2.6. QNAME=BRL.MIL, QTYPE=A
24322432+24332433+If this query is sent to C.ISI.EDU, the reply would be:
24342434+24352435+ +---------------------------------------------------+
24362436+ Header | OPCODE=SQUERY, RESPONSE |
24372437+ +---------------------------------------------------+
24382438+ Question | QNAME=BRL.MIL, QCLASS=IN, QTYPE=A |
24392439+ +---------------------------------------------------+
24402440+ Answer | <empty> |
24412441+ +---------------------------------------------------+
24422442+ Authority | MIL. 86400 IN NS SRI-NIC.ARPA. |
24432443+ | 86400 NS A.ISI.EDU. |
24442444+ +---------------------------------------------------+
24452445+ Additional | A.ISI.EDU. A 26.3.0.103 |
24462446+ | SRI-NIC.ARPA. A 26.0.0.73 |
24472447+ | A 10.0.0.51 |
24482448+ +---------------------------------------------------+
24492449+24502450+This response has an empty answer section, but is not authoritative, so
24512451+it is a referral. The name server on C.ISI.EDU, realizing that it is
24522452+not authoritative for the MIL domain, has referred the requester to
24532453+servers on A.ISI.EDU and SRI-NIC.ARPA, which it knows are authoritative
24542454+for the MIL domain.
24552455+24562456+24572457+24582458+24592459+24602460+Mockapetris [Page 44]
24612461+24622462+RFC 1034 Domain Concepts and Facilities November 1987
24632463+24642464+24652465+6.2.7. QNAME=USC-ISIC.ARPA, QTYPE=A
24662466+24672467+The response to this query from A.ISI.EDU would be:
24682468+24692469+ +---------------------------------------------------+
24702470+ Header | OPCODE=SQUERY, RESPONSE, AA |
24712471+ +---------------------------------------------------+
24722472+ Question | QNAME=USC-ISIC.ARPA., QCLASS=IN, QTYPE=A |
24732473+ +---------------------------------------------------+
24742474+ Answer | USC-ISIC.ARPA. 86400 IN CNAME C.ISI.EDU. |
24752475+ | C.ISI.EDU. 86400 IN A 10.0.0.52 |
24762476+ +---------------------------------------------------+
24772477+ Authority | <empty> |
24782478+ +---------------------------------------------------+
24792479+ Additional | <empty> |
24802480+ +---------------------------------------------------+
24812481+24822482+Note that the AA bit in the header guarantees that the data matching
24832483+QNAME is authoritative, but does not say anything about whether the data
24842484+for C.ISI.EDU is authoritative. This complete reply is possible because
24852485+A.ISI.EDU happens to be authoritative for both the ARPA domain where
24862486+USC-ISIC.ARPA is found and the ISI.EDU domain where C.ISI.EDU data is
24872487+found.
24882488+24892489+If the same query was sent to C.ISI.EDU, its response might be the same
24902490+as shown above if it had its own address in its cache, but might also
24912491+be:
24922492+24932493+24942494+24952495+24962496+24972497+24982498+24992499+25002500+25012501+25022502+25032503+25042504+25052505+25062506+25072507+25082508+25092509+25102510+25112511+25122512+25132513+25142514+25152515+25162516+Mockapetris [Page 45]
25172517+25182518+RFC 1034 Domain Concepts and Facilities November 1987
25192519+25202520+25212521+ +---------------------------------------------------+
25222522+ Header | OPCODE=SQUERY, RESPONSE, AA |
25232523+ +---------------------------------------------------+
25242524+ Question | QNAME=USC-ISIC.ARPA., QCLASS=IN, QTYPE=A |
25252525+ +---------------------------------------------------+
25262526+ Answer | USC-ISIC.ARPA. 86400 IN CNAME C.ISI.EDU. |
25272527+ +---------------------------------------------------+
25282528+ Authority | ISI.EDU. 172800 IN NS VAXA.ISI.EDU. |
25292529+ | NS A.ISI.EDU. |
25302530+ | NS VENERA.ISI.EDU. |
25312531+ +---------------------------------------------------+
25322532+ Additional | VAXA.ISI.EDU. 172800 A 10.2.0.27 |
25332533+ | 172800 A 128.9.0.33 |
25342534+ | VENERA.ISI.EDU. 172800 A 10.1.0.52 |
25352535+ | 172800 A 128.9.0.32 |
25362536+ | A.ISI.EDU. 172800 A 26.3.0.103 |
25372537+ +---------------------------------------------------+
25382538+25392539+This reply contains an authoritative reply for the alias USC-ISIC.ARPA,
25402540+plus a referral to the name servers for ISI.EDU. This sort of reply
25412541+isn't very likely given that the query is for the host name of the name
25422542+server being asked, but would be common for other aliases.
25432543+25442544+6.2.8. QNAME=USC-ISIC.ARPA, QTYPE=CNAME
25452545+25462546+If this query is sent to either A.ISI.EDU or C.ISI.EDU, the reply would
25472547+be:
25482548+25492549+ +---------------------------------------------------+
25502550+ Header | OPCODE=SQUERY, RESPONSE, AA |
25512551+ +---------------------------------------------------+
25522552+ Question | QNAME=USC-ISIC.ARPA., QCLASS=IN, QTYPE=A |
25532553+ +---------------------------------------------------+
25542554+ Answer | USC-ISIC.ARPA. 86400 IN CNAME C.ISI.EDU. |
25552555+ +---------------------------------------------------+
25562556+ Authority | <empty> |
25572557+ +---------------------------------------------------+
25582558+ Additional | <empty> |
25592559+ +---------------------------------------------------+
25602560+25612561+Because QTYPE=CNAME, the CNAME RR itself answers the query, and the name
25622562+server doesn't attempt to look up anything for C.ISI.EDU. (Except
25632563+possibly for the additional section.)
25642564+25652565+6.3. Example resolution
25662566+25672567+The following examples illustrate the operations a resolver must perform
25682568+for its client. We assume that the resolver is starting without a
25692569+25702570+25712571+25722572+Mockapetris [Page 46]
25732573+25742574+RFC 1034 Domain Concepts and Facilities November 1987
25752575+25762576+25772577+cache, as might be the case after system boot. We further assume that
25782578+the system is not one of the hosts in the data and that the host is
25792579+located somewhere on net 26, and that its safety belt (SBELT) data
25802580+structure has the following information:
25812581+25822582+ Match count = -1
25832583+ SRI-NIC.ARPA. 26.0.0.73 10.0.0.51
25842584+ A.ISI.EDU. 26.3.0.103
25852585+25862586+This information specifies servers to try, their addresses, and a match
25872587+count of -1, which says that the servers aren't very close to the
25882588+target. Note that the -1 isn't supposed to be an accurate closeness
25892589+measure, just a value so that later stages of the algorithm will work.
25902590+25912591+The following examples illustrate the use of a cache, so each example
25922592+assumes that previous requests have completed.
25932593+25942594+6.3.1. Resolve MX for ISI.EDU.
25952595+25962596+Suppose the first request to the resolver comes from the local mailer,
25972597+which has mail for PVM@ISI.EDU. The mailer might then ask for type MX
25982598+RRs for the domain name ISI.EDU.
25992599+26002600+The resolver would look in its cache for MX RRs at ISI.EDU, but the
26012601+empty cache wouldn't be helpful. The resolver would recognize that it
26022602+needed to query foreign servers and try to determine the best servers to
26032603+query. This search would look for NS RRs for the domains ISI.EDU, EDU,
26042604+and the root. These searches of the cache would also fail. As a last
26052605+resort, the resolver would use the information from the SBELT, copying
26062606+it into its SLIST structure.
26072607+26082608+At this point the resolver would need to pick one of the three available
26092609+addresses to try. Given that the resolver is on net 26, it should
26102610+choose either 26.0.0.73 or 26.3.0.103 as its first choice. It would
26112611+then send off a query of the form:
26122612+26132613+26142614+26152615+26162616+26172617+26182618+26192619+26202620+26212621+26222622+26232623+26242624+26252625+26262626+26272627+26282628+Mockapetris [Page 47]
26292629+26302630+RFC 1034 Domain Concepts and Facilities November 1987
26312631+26322632+26332633+ +---------------------------------------------------+
26342634+ Header | OPCODE=SQUERY |
26352635+ +---------------------------------------------------+
26362636+ Question | QNAME=ISI.EDU., QCLASS=IN, QTYPE=MX |
26372637+ +---------------------------------------------------+
26382638+ Answer | <empty> |
26392639+ +---------------------------------------------------+
26402640+ Authority | <empty> |
26412641+ +---------------------------------------------------+
26422642+ Additional | <empty> |
26432643+ +---------------------------------------------------+
26442644+26452645+The resolver would then wait for a response to its query or a timeout.
26462646+If the timeout occurs, it would try different servers, then different
26472647+addresses of the same servers, lastly retrying addresses already tried.
26482648+It might eventually receive a reply from SRI-NIC.ARPA:
26492649+26502650+ +---------------------------------------------------+
26512651+ Header | OPCODE=SQUERY, RESPONSE |
26522652+ +---------------------------------------------------+
26532653+ Question | QNAME=ISI.EDU., QCLASS=IN, QTYPE=MX |
26542654+ +---------------------------------------------------+
26552655+ Answer | <empty> |
26562656+ +---------------------------------------------------+
26572657+ Authority | ISI.EDU. 172800 IN NS VAXA.ISI.EDU. |
26582658+ | NS A.ISI.EDU. |
26592659+ | NS VENERA.ISI.EDU.|
26602660+ +---------------------------------------------------+
26612661+ Additional | VAXA.ISI.EDU. 172800 A 10.2.0.27 |
26622662+ | 172800 A 128.9.0.33 |
26632663+ | VENERA.ISI.EDU. 172800 A 10.1.0.52 |
26642664+ | 172800 A 128.9.0.32 |
26652665+ | A.ISI.EDU. 172800 A 26.3.0.103 |
26662666+ +---------------------------------------------------+
26672667+26682668+The resolver would notice that the information in the response gave a
26692669+closer delegation to ISI.EDU than its existing SLIST (since it matches
26702670+three labels). The resolver would then cache the information in this
26712671+response and use it to set up a new SLIST:
26722672+26732673+ Match count = 3
26742674+ A.ISI.EDU. 26.3.0.103
26752675+ VAXA.ISI.EDU. 10.2.0.27 128.9.0.33
26762676+ VENERA.ISI.EDU. 10.1.0.52 128.9.0.32
26772677+26782678+A.ISI.EDU appears on this list as well as the previous one, but that is
26792679+purely coincidental. The resolver would again start transmitting and
26802680+waiting for responses. Eventually it would get an answer:
26812681+26822682+26832683+26842684+Mockapetris [Page 48]
26852685+26862686+RFC 1034 Domain Concepts and Facilities November 1987
26872687+26882688+26892689+ +---------------------------------------------------+
26902690+ Header | OPCODE=SQUERY, RESPONSE, AA |
26912691+ +---------------------------------------------------+
26922692+ Question | QNAME=ISI.EDU., QCLASS=IN, QTYPE=MX |
26932693+ +---------------------------------------------------+
26942694+ Answer | ISI.EDU. MX 10 VENERA.ISI.EDU. |
26952695+ | MX 20 VAXA.ISI.EDU. |
26962696+ +---------------------------------------------------+
26972697+ Authority | <empty> |
26982698+ +---------------------------------------------------+
26992699+ Additional | VAXA.ISI.EDU. 172800 A 10.2.0.27 |
27002700+ | 172800 A 128.9.0.33 |
27012701+ | VENERA.ISI.EDU. 172800 A 10.1.0.52 |
27022702+ | 172800 A 128.9.0.32 |
27032703+ +---------------------------------------------------+
27042704+27052705+The resolver would add this information to its cache, and return the MX
27062706+RRs to its client.
27072707+27082708+6.3.2. Get the host name for address 26.6.0.65
27092709+27102710+The resolver would translate this into a request for PTR RRs for
27112711+65.0.6.26.IN-ADDR.ARPA. This information is not in the cache, so the
27122712+resolver would look for foreign servers to ask. No servers would match,
27132713+so it would use SBELT again. (Note that the servers for the ISI.EDU
27142714+domain are in the cache, but ISI.EDU is not an ancestor of
27152715+65.0.6.26.IN-ADDR.ARPA, so the SBELT is used.)
27162716+27172717+Since this request is within the authoritative data of both servers in
27182718+SBELT, eventually one would return:
27192719+27202720+27212721+27222722+27232723+27242724+27252725+27262726+27272727+27282728+27292729+27302730+27312731+27322732+27332733+27342734+27352735+27362736+27372737+27382738+27392739+27402740+Mockapetris [Page 49]
27412741+27422742+RFC 1034 Domain Concepts and Facilities November 1987
27432743+27442744+27452745+ +---------------------------------------------------+
27462746+ Header | OPCODE=SQUERY, RESPONSE, AA |
27472747+ +---------------------------------------------------+
27482748+ Question | QNAME=65.0.6.26.IN-ADDR.ARPA.,QCLASS=IN,QTYPE=PTR |
27492749+ +---------------------------------------------------+
27502750+ Answer | 65.0.6.26.IN-ADDR.ARPA. PTR ACC.ARPA. |
27512751+ +---------------------------------------------------+
27522752+ Authority | <empty> |
27532753+ +---------------------------------------------------+
27542754+ Additional | <empty> |
27552755+ +---------------------------------------------------+
27562756+27572757+6.3.3. Get the host address of poneria.ISI.EDU
27582758+27592759+This request would translate into a type A request for poneria.ISI.EDU.
27602760+The resolver would not find any cached data for this name, but would
27612761+find the NS RRs in the cache for ISI.EDU when it looks for foreign
27622762+servers to ask. Using this data, it would construct a SLIST of the
27632763+form:
27642764+27652765+ Match count = 3
27662766+27672767+ A.ISI.EDU. 26.3.0.103
27682768+ VAXA.ISI.EDU. 10.2.0.27 128.9.0.33
27692769+ VENERA.ISI.EDU. 10.1.0.52
27702770+27712771+A.ISI.EDU is listed first on the assumption that the resolver orders its
27722772+choices by preference, and A.ISI.EDU is on the same network.
27732773+27742774+One of these servers would answer the query.
27752775+27762776+7. REFERENCES and BIBLIOGRAPHY
27772777+27782778+[Dyer 87] Dyer, S., and F. Hsu, "Hesiod", Project Athena
27792779+ Technical Plan - Name Service, April 1987, version 1.9.
27802780+27812781+ Describes the fundamentals of the Hesiod name service.
27822782+27832783+[IEN-116] J. Postel, "Internet Name Server", IEN-116,
27842784+ USC/Information Sciences Institute, August 1979.
27852785+27862786+ A name service obsoleted by the Domain Name System, but
27872787+ still in use.
27882788+27892789+27902790+27912791+27922792+27932793+27942794+27952795+27962796+Mockapetris [Page 50]
27972797+27982798+RFC 1034 Domain Concepts and Facilities November 1987
27992799+28002800+28012801+[Quarterman 86] Quarterman, J., and J. Hoskins, "Notable Computer
28022802+ Networks",Communications of the ACM, October 1986,
28032803+ volume 29, number 10.
28042804+28052805+[RFC-742] K. Harrenstien, "NAME/FINGER", RFC-742, Network
28062806+ Information Center, SRI International, December 1977.
28072807+28082808+[RFC-768] J. Postel, "User Datagram Protocol", RFC-768,
28092809+ USC/Information Sciences Institute, August 1980.
28102810+28112811+[RFC-793] J. Postel, "Transmission Control Protocol", RFC-793,
28122812+ USC/Information Sciences Institute, September 1981.
28132813+28142814+[RFC-799] D. Mills, "Internet Name Domains", RFC-799, COMSAT,
28152815+ September 1981.
28162816+28172817+ Suggests introduction of a hierarchy in place of a flat
28182818+ name space for the Internet.
28192819+28202820+[RFC-805] J. Postel, "Computer Mail Meeting Notes", RFC-805,
28212821+ USC/Information Sciences Institute, February 1982.
28222822+28232823+[RFC-810] E. Feinler, K. Harrenstien, Z. Su, and V. White, "DOD
28242824+ Internet Host Table Specification", RFC-810, Network
28252825+ Information Center, SRI International, March 1982.
28262826+28272827+ Obsolete. See RFC-952.
28282828+28292829+[RFC-811] K. Harrenstien, V. White, and E. Feinler, "Hostnames
28302830+ Server", RFC-811, Network Information Center, SRI
28312831+ International, March 1982.
28322832+28332833+ Obsolete. See RFC-953.
28342834+28352835+[RFC-812] K. Harrenstien, and V. White, "NICNAME/WHOIS", RFC-812,
28362836+ Network Information Center, SRI International, March
28372837+ 1982.
28382838+28392839+[RFC-819] Z. Su, and J. Postel, "The Domain Naming Convention for
28402840+ Internet User Applications", RFC-819, Network
28412841+ Information Center, SRI International, August 1982.
28422842+28432843+ Early thoughts on the design of the domain system.
28442844+ Current implementation is completely different.
28452845+28462846+[RFC-821] J. Postel, "Simple Mail Transfer Protocol", RFC-821,
28472847+ USC/Information Sciences Institute, August 1980.
28482848+28492849+28502850+28512851+28522852+Mockapetris [Page 51]
28532853+28542854+RFC 1034 Domain Concepts and Facilities November 1987
28552855+28562856+28572857+[RFC-830] Z. Su, "A Distributed System for Internet Name Service",
28582858+ RFC-830, Network Information Center, SRI International,
28592859+ October 1982.
28602860+28612861+ Early thoughts on the design of the domain system.
28622862+ Current implementation is completely different.
28632863+28642864+[RFC-882] P. Mockapetris, "Domain names - Concepts and
28652865+ Facilities," RFC-882, USC/Information Sciences
28662866+ Institute, November 1983.
28672867+28682868+ Superceeded by this memo.
28692869+28702870+[RFC-883] P. Mockapetris, "Domain names - Implementation and
28712871+ Specification," RFC-883, USC/Information Sciences
28722872+ Institute, November 1983.
28732873+28742874+ Superceeded by this memo.
28752875+28762876+[RFC-920] J. Postel and J. Reynolds, "Domain Requirements",
28772877+ RFC-920, USC/Information Sciences Institute
28782878+ October 1984.
28792879+28802880+ Explains the naming scheme for top level domains.
28812881+28822882+[RFC-952] K. Harrenstien, M. Stahl, E. Feinler, "DoD Internet Host
28832883+ Table Specification", RFC-952, SRI, October 1985.
28842884+28852885+ Specifies the format of HOSTS.TXT, the host/address
28862886+ table replaced by the DNS.
28872887+28882888+[RFC-953] K. Harrenstien, M. Stahl, E. Feinler, "HOSTNAME Server",
28892889+ RFC-953, SRI, October 1985.
28902890+28912891+ This RFC contains the official specification of the
28922892+ hostname server protocol, which is obsoleted by the DNS.
28932893+ This TCP based protocol accesses information stored in
28942894+ the RFC-952 format, and is used to obtain copies of the
28952895+ host table.
28962896+28972897+[RFC-973] P. Mockapetris, "Domain System Changes and
28982898+ Observations", RFC-973, USC/Information Sciences
28992899+ Institute, January 1986.
29002900+29012901+ Describes changes to RFC-882 and RFC-883 and reasons for
29022902+ them. Now obsolete.
29032903+29042904+29052905+29062906+29072907+29082908+Mockapetris [Page 52]
29092909+29102910+RFC 1034 Domain Concepts and Facilities November 1987
29112911+29122912+29132913+[RFC-974] C. Partridge, "Mail routing and the domain system",
29142914+ RFC-974, CSNET CIC BBN Labs, January 1986.
29152915+29162916+ Describes the transition from HOSTS.TXT based mail
29172917+ addressing to the more powerful MX system used with the
29182918+ domain system.
29192919+29202920+[RFC-1001] NetBIOS Working Group, "Protocol standard for a NetBIOS
29212921+ service on a TCP/UDP transport: Concepts and Methods",
29222922+ RFC-1001, March 1987.
29232923+29242924+ This RFC and RFC-1002 are a preliminary design for
29252925+ NETBIOS on top of TCP/IP which proposes to base NetBIOS
29262926+ name service on top of the DNS.
29272927+29282928+[RFC-1002] NetBIOS Working Group, "Protocol standard for a NetBIOS
29292929+ service on a TCP/UDP transport: Detailed
29302930+ Specifications", RFC-1002, March 1987.
29312931+29322932+[RFC-1010] J. Reynolds and J. Postel, "Assigned Numbers", RFC-1010,
29332933+ USC/Information Sciences Institute, May 1987
29342934+29352935+ Contains socket numbers and mnemonics for host names,
29362936+ operating systems, etc.
29372937+29382938+[RFC-1031] W. Lazear, "MILNET Name Domain Transition", RFC-1031,
29392939+ November 1987.
29402940+29412941+ Describes a plan for converting the MILNET to the DNS.
29422942+29432943+[RFC-1032] M. K. Stahl, "Establishing a Domain - Guidelines for
29442944+ Administrators", RFC-1032, November 1987.
29452945+29462946+ Describes the registration policies used by the NIC to
29472947+ administer the top level domains and delegate subzones.
29482948+29492949+[RFC-1033] M. K. Lottor, "Domain Administrators Operations Guide",
29502950+ RFC-1033, November 1987.
29512951+29522952+ A cookbook for domain administrators.
29532953+29542954+[Solomon 82] M. Solomon, L. Landweber, and D. Neuhengen, "The CSNET
29552955+ Name Server", Computer Networks, vol 6, nr 3, July 1982.
29562956+29572957+ Describes a name service for CSNET which is independent
29582958+ from the DNS and DNS use in the CSNET.
29592959+29602960+29612961+29622962+29632963+29642964+Mockapetris [Page 53]
29652965+29662966+RFC 1034 Domain Concepts and Facilities November 1987
29672967+29682968+29692969+Index
29702970+29712971+ A 12
29722972+ Absolute names 8
29732973+ Aliases 14, 31
29742974+ Authority 6
29752975+ AXFR 17
29762976+29772977+ Case of characters 7
29782978+ CH 12
29792979+ CNAME 12, 13, 31
29802980+ Completion queries 18
29812981+29822982+ Domain name 6, 7
29832983+29842984+ Glue RRs 20
29852985+29862986+ HINFO 12
29872987+29882988+ IN 12
29892989+ Inverse queries 16
29902990+ Iterative 4
29912991+29922992+ Label 7
29932993+29942994+ Mailbox names 9
29952995+ MX 12
29962996+29972997+ Name error 27, 36
29982998+ Name servers 5, 17
29992999+ NE 30
30003000+ Negative caching 44
30013001+ NS 12
30023002+30033003+ Opcode 16
30043004+30053005+ PTR 12
30063006+30073007+ QCLASS 16
30083008+ QTYPE 16
30093009+30103010+ RDATA 13
30113011+ Recursive 4
30123012+ Recursive service 22
30133013+ Relative names 7
30143014+ Resolvers 6
30153015+ RR 12
30163016+30173017+30183018+30193019+30203020+Mockapetris [Page 54]
30213021+30223022+RFC 1034 Domain Concepts and Facilities November 1987
30233023+30243024+30253025+ Safety belt 33
30263026+ Sections 16
30273027+ SOA 12
30283028+ Standard queries 22
30293029+30303030+ Status queries 18
30313031+ Stub resolvers 32
30323032+30333033+ TTL 12, 13
30343034+30353035+ Wildcards 25
30363036+30373037+ Zone transfers 28
30383038+ Zones 19
30393039+30403040+30413041+30423042+30433043+30443044+30453045+30463046+30473047+30483048+30493049+30503050+30513051+30523052+30533053+30543054+30553055+30563056+30573057+30583058+30593059+30603060+30613061+30623062+30633063+30643064+30653065+30663066+30673067+30683068+30693069+30703070+30713071+30723072+30733073+30743074+30753075+30763076+Mockapetris [Page 55]
30773077+
+3077
spec/rfc1035.txt
···11+Network Working Group P. Mockapetris
22+Request for Comments: 1035 ISI
33+ November 1987
44+Obsoletes: RFCs 882, 883, 973
55+66+ DOMAIN NAMES - IMPLEMENTATION AND SPECIFICATION
77+88+99+1. STATUS OF THIS MEMO
1010+1111+This RFC describes the details of the domain system and protocol, and
1212+assumes that the reader is familiar with the concepts discussed in a
1313+companion RFC, "Domain Names - Concepts and Facilities" [RFC-1034].
1414+1515+The domain system is a mixture of functions and data types which are an
1616+official protocol and functions and data types which are still
1717+experimental. Since the domain system is intentionally extensible, new
1818+data types and experimental behavior should always be expected in parts
1919+of the system beyond the official protocol. The official protocol parts
2020+include standard queries, responses and the Internet class RR data
2121+formats (e.g., host addresses). Since the previous RFC set, several
2222+definitions have changed, so some previous definitions are obsolete.
2323+2424+Experimental or obsolete features are clearly marked in these RFCs, and
2525+such information should be used with caution.
2626+2727+The reader is especially cautioned not to depend on the values which
2828+appear in examples to be current or complete, since their purpose is
2929+primarily pedagogical. Distribution of this memo is unlimited.
3030+3131+ Table of Contents
3232+3333+ 1. STATUS OF THIS MEMO 1
3434+ 2. INTRODUCTION 3
3535+ 2.1. Overview 3
3636+ 2.2. Common configurations 4
3737+ 2.3. Conventions 7
3838+ 2.3.1. Preferred name syntax 7
3939+ 2.3.2. Data Transmission Order 8
4040+ 2.3.3. Character Case 9
4141+ 2.3.4. Size limits 10
4242+ 3. DOMAIN NAME SPACE AND RR DEFINITIONS 10
4343+ 3.1. Name space definitions 10
4444+ 3.2. RR definitions 11
4545+ 3.2.1. Format 11
4646+ 3.2.2. TYPE values 12
4747+ 3.2.3. QTYPE values 12
4848+ 3.2.4. CLASS values 13
4949+5050+5151+5252+Mockapetris [Page 1]
5353+5454+RFC 1035 Domain Implementation and Specification November 1987
5555+5656+5757+ 3.2.5. QCLASS values 13
5858+ 3.3. Standard RRs 13
5959+ 3.3.1. CNAME RDATA format 14
6060+ 3.3.2. HINFO RDATA format 14
6161+ 3.3.3. MB RDATA format (EXPERIMENTAL) 14
6262+ 3.3.4. MD RDATA format (Obsolete) 15
6363+ 3.3.5. MF RDATA format (Obsolete) 15
6464+ 3.3.6. MG RDATA format (EXPERIMENTAL) 16
6565+ 3.3.7. MINFO RDATA format (EXPERIMENTAL) 16
6666+ 3.3.8. MR RDATA format (EXPERIMENTAL) 17
6767+ 3.3.9. MX RDATA format 17
6868+ 3.3.10. NULL RDATA format (EXPERIMENTAL) 17
6969+ 3.3.11. NS RDATA format 18
7070+ 3.3.12. PTR RDATA format 18
7171+ 3.3.13. SOA RDATA format 19
7272+ 3.3.14. TXT RDATA format 20
7373+ 3.4. ARPA Internet specific RRs 20
7474+ 3.4.1. A RDATA format 20
7575+ 3.4.2. WKS RDATA format 21
7676+ 3.5. IN-ADDR.ARPA domain 22
7777+ 3.6. Defining new types, classes, and special namespaces 24
7878+ 4. MESSAGES 25
7979+ 4.1. Format 25
8080+ 4.1.1. Header section format 26
8181+ 4.1.2. Question section format 28
8282+ 4.1.3. Resource record format 29
8383+ 4.1.4. Message compression 30
8484+ 4.2. Transport 32
8585+ 4.2.1. UDP usage 32
8686+ 4.2.2. TCP usage 32
8787+ 5. MASTER FILES 33
8888+ 5.1. Format 33
8989+ 5.2. Use of master files to define zones 35
9090+ 5.3. Master file example 36
9191+ 6. NAME SERVER IMPLEMENTATION 37
9292+ 6.1. Architecture 37
9393+ 6.1.1. Control 37
9494+ 6.1.2. Database 37
9595+ 6.1.3. Time 39
9696+ 6.2. Standard query processing 39
9797+ 6.3. Zone refresh and reload processing 39
9898+ 6.4. Inverse queries (Optional) 40
9999+ 6.4.1. The contents of inverse queries and responses 40
100100+ 6.4.2. Inverse query and response example 41
101101+ 6.4.3. Inverse query processing 42
102102+103103+104104+105105+106106+107107+108108+Mockapetris [Page 2]
109109+110110+RFC 1035 Domain Implementation and Specification November 1987
111111+112112+113113+ 6.5. Completion queries and responses 42
114114+ 7. RESOLVER IMPLEMENTATION 43
115115+ 7.1. Transforming a user request into a query 43
116116+ 7.2. Sending the queries 44
117117+ 7.3. Processing responses 46
118118+ 7.4. Using the cache 47
119119+ 8. MAIL SUPPORT 47
120120+ 8.1. Mail exchange binding 48
121121+ 8.2. Mailbox binding (Experimental) 48
122122+ 9. REFERENCES and BIBLIOGRAPHY 50
123123+ Index 54
124124+125125+2. INTRODUCTION
126126+127127+2.1. Overview
128128+129129+The goal of domain names is to provide a mechanism for naming resources
130130+in such a way that the names are usable in different hosts, networks,
131131+protocol families, internets, and administrative organizations.
132132+133133+From the user's point of view, domain names are useful as arguments to a
134134+local agent, called a resolver, which retrieves information associated
135135+with the domain name. Thus a user might ask for the host address or
136136+mail information associated with a particular domain name. To enable
137137+the user to request a particular type of information, an appropriate
138138+query type is passed to the resolver with the domain name. To the user,
139139+the domain tree is a single information space; the resolver is
140140+responsible for hiding the distribution of data among name servers from
141141+the user.
142142+143143+From the resolver's point of view, the database that makes up the domain
144144+space is distributed among various name servers. Different parts of the
145145+domain space are stored in different name servers, although a particular
146146+data item will be stored redundantly in two or more name servers. The
147147+resolver starts with knowledge of at least one name server. When the
148148+resolver processes a user query it asks a known name server for the
149149+information; in return, the resolver either receives the desired
150150+information or a referral to another name server. Using these
151151+referrals, resolvers learn the identities and contents of other name
152152+servers. Resolvers are responsible for dealing with the distribution of
153153+the domain space and dealing with the effects of name server failure by
154154+consulting redundant databases in other servers.
155155+156156+Name servers manage two kinds of data. The first kind of data held in
157157+sets called zones; each zone is the complete database for a particular
158158+"pruned" subtree of the domain space. This data is called
159159+authoritative. A name server periodically checks to make sure that its
160160+zones are up to date, and if not, obtains a new copy of updated zones
161161+162162+163163+164164+Mockapetris [Page 3]
165165+166166+RFC 1035 Domain Implementation and Specification November 1987
167167+168168+169169+from master files stored locally or in another name server. The second
170170+kind of data is cached data which was acquired by a local resolver.
171171+This data may be incomplete, but improves the performance of the
172172+retrieval process when non-local data is repeatedly accessed. Cached
173173+data is eventually discarded by a timeout mechanism.
174174+175175+This functional structure isolates the problems of user interface,
176176+failure recovery, and distribution in the resolvers and isolates the
177177+database update and refresh problems in the name servers.
178178+179179+2.2. Common configurations
180180+181181+A host can participate in the domain name system in a number of ways,
182182+depending on whether the host runs programs that retrieve information
183183+from the domain system, name servers that answer queries from other
184184+hosts, or various combinations of both functions. The simplest, and
185185+perhaps most typical, configuration is shown below:
186186+187187+ Local Host | Foreign
188188+ |
189189+ +---------+ +----------+ | +--------+
190190+ | | user queries | |queries | | |
191191+ | User |-------------->| |---------|->|Foreign |
192192+ | Program | | Resolver | | | Name |
193193+ | |<--------------| |<--------|--| Server |
194194+ | | user responses| |responses| | |
195195+ +---------+ +----------+ | +--------+
196196+ | A |
197197+ cache additions | | references |
198198+ V | |
199199+ +----------+ |
200200+ | cache | |
201201+ +----------+ |
202202+203203+User programs interact with the domain name space through resolvers; the
204204+format of user queries and user responses is specific to the host and
205205+its operating system. User queries will typically be operating system
206206+calls, and the resolver and its cache will be part of the host operating
207207+system. Less capable hosts may choose to implement the resolver as a
208208+subroutine to be linked in with every program that needs its services.
209209+Resolvers answer user queries with information they acquire via queries
210210+to foreign name servers and the local cache.
211211+212212+Note that the resolver may have to make several queries to several
213213+different foreign name servers to answer a particular user query, and
214214+hence the resolution of a user query may involve several network
215215+accesses and an arbitrary amount of time. The queries to foreign name
216216+servers and the corresponding responses have a standard format described
217217+218218+219219+220220+Mockapetris [Page 4]
221221+222222+RFC 1035 Domain Implementation and Specification November 1987
223223+224224+225225+in this memo, and may be datagrams.
226226+227227+Depending on its capabilities, a name server could be a stand alone
228228+program on a dedicated machine or a process or processes on a large
229229+timeshared host. A simple configuration might be:
230230+231231+ Local Host | Foreign
232232+ |
233233+ +---------+ |
234234+ / /| |
235235+ +---------+ | +----------+ | +--------+
236236+ | | | | |responses| | |
237237+ | | | | Name |---------|->|Foreign |
238238+ | Master |-------------->| Server | | |Resolver|
239239+ | files | | | |<--------|--| |
240240+ | |/ | | queries | +--------+
241241+ +---------+ +----------+ |
242242+243243+Here a primary name server acquires information about one or more zones
244244+by reading master files from its local file system, and answers queries
245245+about those zones that arrive from foreign resolvers.
246246+247247+The DNS requires that all zones be redundantly supported by more than
248248+one name server. Designated secondary servers can acquire zones and
249249+check for updates from the primary server using the zone transfer
250250+protocol of the DNS. This configuration is shown below:
251251+252252+ Local Host | Foreign
253253+ |
254254+ +---------+ |
255255+ / /| |
256256+ +---------+ | +----------+ | +--------+
257257+ | | | | |responses| | |
258258+ | | | | Name |---------|->|Foreign |
259259+ | Master |-------------->| Server | | |Resolver|
260260+ | files | | | |<--------|--| |
261261+ | |/ | | queries | +--------+
262262+ +---------+ +----------+ |
263263+ A |maintenance | +--------+
264264+ | +------------|->| |
265265+ | queries | |Foreign |
266266+ | | | Name |
267267+ +------------------|--| Server |
268268+ maintenance responses | +--------+
269269+270270+In this configuration, the name server periodically establishes a
271271+virtual circuit to a foreign name server to acquire a copy of a zone or
272272+to check that an existing copy has not changed. The messages sent for
273273+274274+275275+276276+Mockapetris [Page 5]
277277+278278+RFC 1035 Domain Implementation and Specification November 1987
279279+280280+281281+these maintenance activities follow the same form as queries and
282282+responses, but the message sequences are somewhat different.
283283+284284+The information flow in a host that supports all aspects of the domain
285285+name system is shown below:
286286+287287+ Local Host | Foreign
288288+ |
289289+ +---------+ +----------+ | +--------+
290290+ | | user queries | |queries | | |
291291+ | User |-------------->| |---------|->|Foreign |
292292+ | Program | | Resolver | | | Name |
293293+ | |<--------------| |<--------|--| Server |
294294+ | | user responses| |responses| | |
295295+ +---------+ +----------+ | +--------+
296296+ | A |
297297+ cache additions | | references |
298298+ V | |
299299+ +----------+ |
300300+ | Shared | |
301301+ | database | |
302302+ +----------+ |
303303+ A | |
304304+ +---------+ refreshes | | references |
305305+ / /| | V |
306306+ +---------+ | +----------+ | +--------+
307307+ | | | | |responses| | |
308308+ | | | | Name |---------|->|Foreign |
309309+ | Master |-------------->| Server | | |Resolver|
310310+ | files | | | |<--------|--| |
311311+ | |/ | | queries | +--------+
312312+ +---------+ +----------+ |
313313+ A |maintenance | +--------+
314314+ | +------------|->| |
315315+ | queries | |Foreign |
316316+ | | | Name |
317317+ +------------------|--| Server |
318318+ maintenance responses | +--------+
319319+320320+The shared database holds domain space data for the local name server
321321+and resolver. The contents of the shared database will typically be a
322322+mixture of authoritative data maintained by the periodic refresh
323323+operations of the name server and cached data from previous resolver
324324+requests. The structure of the domain data and the necessity for
325325+synchronization between name servers and resolvers imply the general
326326+characteristics of this database, but the actual format is up to the
327327+local implementor.
328328+329329+330330+331331+332332+Mockapetris [Page 6]
333333+334334+RFC 1035 Domain Implementation and Specification November 1987
335335+336336+337337+Information flow can also be tailored so that a group of hosts act
338338+together to optimize activities. Sometimes this is done to offload less
339339+capable hosts so that they do not have to implement a full resolver.
340340+This can be appropriate for PCs or hosts which want to minimize the
341341+amount of new network code which is required. This scheme can also
342342+allow a group of hosts can share a small number of caches rather than
343343+maintaining a large number of separate caches, on the premise that the
344344+centralized caches will have a higher hit ratio. In either case,
345345+resolvers are replaced with stub resolvers which act as front ends to
346346+resolvers located in a recursive server in one or more name servers
347347+known to perform that service:
348348+349349+ Local Hosts | Foreign
350350+ |
351351+ +---------+ |
352352+ | | responses |
353353+ | Stub |<--------------------+ |
354354+ | Resolver| | |
355355+ | |----------------+ | |
356356+ +---------+ recursive | | |
357357+ queries | | |
358358+ V | |
359359+ +---------+ recursive +----------+ | +--------+
360360+ | | queries | |queries | | |
361361+ | Stub |-------------->| Recursive|---------|->|Foreign |
362362+ | Resolver| | Server | | | Name |
363363+ | |<--------------| |<--------|--| Server |
364364+ +---------+ responses | |responses| | |
365365+ +----------+ | +--------+
366366+ | Central | |
367367+ | cache | |
368368+ +----------+ |
369369+370370+In any case, note that domain components are always replicated for
371371+reliability whenever possible.
372372+373373+2.3. Conventions
374374+375375+The domain system has several conventions dealing with low-level, but
376376+fundamental, issues. While the implementor is free to violate these
377377+conventions WITHIN HIS OWN SYSTEM, he must observe these conventions in
378378+ALL behavior observed from other hosts.
379379+380380+2.3.1. Preferred name syntax
381381+382382+The DNS specifications attempt to be as general as possible in the rules
383383+for constructing domain names. The idea is that the name of any
384384+existing object can be expressed as a domain name with minimal changes.
385385+386386+387387+388388+Mockapetris [Page 7]
389389+390390+RFC 1035 Domain Implementation and Specification November 1987
391391+392392+393393+However, when assigning a domain name for an object, the prudent user
394394+will select a name which satisfies both the rules of the domain system
395395+and any existing rules for the object, whether these rules are published
396396+or implied by existing programs.
397397+398398+For example, when naming a mail domain, the user should satisfy both the
399399+rules of this memo and those in RFC-822. When creating a new host name,
400400+the old rules for HOSTS.TXT should be followed. This avoids problems
401401+when old software is converted to use domain names.
402402+403403+The following syntax will result in fewer problems with many
404404+405405+applications that use domain names (e.g., mail, TELNET).
406406+407407+<domain> ::= <subdomain> | " "
408408+409409+<subdomain> ::= <label> | <subdomain> "." <label>
410410+411411+<label> ::= <letter> [ [ <ldh-str> ] <let-dig> ]
412412+413413+<ldh-str> ::= <let-dig-hyp> | <let-dig-hyp> <ldh-str>
414414+415415+<let-dig-hyp> ::= <let-dig> | "-"
416416+417417+<let-dig> ::= <letter> | <digit>
418418+419419+<letter> ::= any one of the 52 alphabetic characters A through Z in
420420+upper case and a through z in lower case
421421+422422+<digit> ::= any one of the ten digits 0 through 9
423423+424424+Note that while upper and lower case letters are allowed in domain
425425+names, no significance is attached to the case. That is, two names with
426426+the same spelling but different case are to be treated as if identical.
427427+428428+The labels must follow the rules for ARPANET host names. They must
429429+start with a letter, end with a letter or digit, and have as interior
430430+characters only letters, digits, and hyphen. There are also some
431431+restrictions on the length. Labels must be 63 characters or less.
432432+433433+For example, the following strings identify hosts in the Internet:
434434+435435+A.ISI.EDU XX.LCS.MIT.EDU SRI-NIC.ARPA
436436+437437+2.3.2. Data Transmission Order
438438+439439+The order of transmission of the header and data described in this
440440+document is resolved to the octet level. Whenever a diagram shows a
441441+442442+443443+444444+Mockapetris [Page 8]
445445+446446+RFC 1035 Domain Implementation and Specification November 1987
447447+448448+449449+group of octets, the order of transmission of those octets is the normal
450450+order in which they are read in English. For example, in the following
451451+diagram, the octets are transmitted in the order they are numbered.
452452+453453+ 0 1
454454+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
455455+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
456456+ | 1 | 2 |
457457+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
458458+ | 3 | 4 |
459459+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
460460+ | 5 | 6 |
461461+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
462462+463463+Whenever an octet represents a numeric quantity, the left most bit in
464464+the diagram is the high order or most significant bit. That is, the bit
465465+labeled 0 is the most significant bit. For example, the following
466466+diagram represents the value 170 (decimal).
467467+468468+ 0 1 2 3 4 5 6 7
469469+ +-+-+-+-+-+-+-+-+
470470+ |1 0 1 0 1 0 1 0|
471471+ +-+-+-+-+-+-+-+-+
472472+473473+Similarly, whenever a multi-octet field represents a numeric quantity
474474+the left most bit of the whole field is the most significant bit. When
475475+a multi-octet quantity is transmitted the most significant octet is
476476+transmitted first.
477477+478478+2.3.3. Character Case
479479+480480+For all parts of the DNS that are part of the official protocol, all
481481+comparisons between character strings (e.g., labels, domain names, etc.)
482482+are done in a case-insensitive manner. At present, this rule is in
483483+force throughout the domain system without exception. However, future
484484+additions beyond current usage may need to use the full binary octet
485485+capabilities in names, so attempts to store domain names in 7-bit ASCII
486486+or use of special bytes to terminate labels, etc., should be avoided.
487487+488488+When data enters the domain system, its original case should be
489489+preserved whenever possible. In certain circumstances this cannot be
490490+done. For example, if two RRs are stored in a database, one at x.y and
491491+one at X.Y, they are actually stored at the same place in the database,
492492+and hence only one casing would be preserved. The basic rule is that
493493+case can be discarded only when data is used to define structure in a
494494+database, and two names are identical when compared in a case
495495+insensitive manner.
496496+497497+498498+499499+500500+Mockapetris [Page 9]
501501+502502+RFC 1035 Domain Implementation and Specification November 1987
503503+504504+505505+Loss of case sensitive data must be minimized. Thus while data for x.y
506506+and X.Y may both be stored under a single location x.y or X.Y, data for
507507+a.x and B.X would never be stored under A.x, A.X, b.x, or b.X. In
508508+general, this preserves the case of the first label of a domain name,
509509+but forces standardization of interior node labels.
510510+511511+Systems administrators who enter data into the domain database should
512512+take care to represent the data they supply to the domain system in a
513513+case-consistent manner if their system is case-sensitive. The data
514514+distribution system in the domain system will ensure that consistent
515515+representations are preserved.
516516+517517+2.3.4. Size limits
518518+519519+Various objects and parameters in the DNS have size limits. They are
520520+listed below. Some could be easily changed, others are more
521521+fundamental.
522522+523523+labels 63 octets or less
524524+525525+names 255 octets or less
526526+527527+TTL positive values of a signed 32 bit number.
528528+529529+UDP messages 512 octets or less
530530+531531+3. DOMAIN NAME SPACE AND RR DEFINITIONS
532532+533533+3.1. Name space definitions
534534+535535+Domain names in messages are expressed in terms of a sequence of labels.
536536+Each label is represented as a one octet length field followed by that
537537+number of octets. Since every domain name ends with the null label of
538538+the root, a domain name is terminated by a length byte of zero. The
539539+high order two bits of every length octet must be zero, and the
540540+remaining six bits of the length field limit the label to 63 octets or
541541+less.
542542+543543+To simplify implementations, the total length of a domain name (i.e.,
544544+label octets and label length octets) is restricted to 255 octets or
545545+less.
546546+547547+Although labels can contain any 8 bit values in octets that make up a
548548+label, it is strongly recommended that labels follow the preferred
549549+syntax described elsewhere in this memo, which is compatible with
550550+existing host naming conventions. Name servers and resolvers must
551551+compare labels in a case-insensitive manner (i.e., A=a), assuming ASCII
552552+with zero parity. Non-alphabetic codes must match exactly.
553553+554554+555555+556556+Mockapetris [Page 10]
557557+558558+RFC 1035 Domain Implementation and Specification November 1987
559559+560560+561561+3.2. RR definitions
562562+563563+3.2.1. Format
564564+565565+All RRs have the same top level format shown below:
566566+567567+ 1 1 1 1 1 1
568568+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
569569+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
570570+ | |
571571+ / /
572572+ / NAME /
573573+ | |
574574+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
575575+ | TYPE |
576576+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
577577+ | CLASS |
578578+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
579579+ | TTL |
580580+ | |
581581+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
582582+ | RDLENGTH |
583583+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--|
584584+ / RDATA /
585585+ / /
586586+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
587587+588588+589589+where:
590590+591591+NAME an owner name, i.e., the name of the node to which this
592592+ resource record pertains.
593593+594594+TYPE two octets containing one of the RR TYPE codes.
595595+596596+CLASS two octets containing one of the RR CLASS codes.
597597+598598+TTL a 32 bit signed integer that specifies the time interval
599599+ that the resource record may be cached before the source
600600+ of the information should again be consulted. Zero
601601+ values are interpreted to mean that the RR can only be
602602+ used for the transaction in progress, and should not be
603603+ cached. For example, SOA records are always distributed
604604+ with a zero TTL to prohibit caching. Zero values can
605605+ also be used for extremely volatile data.
606606+607607+RDLENGTH an unsigned 16 bit integer that specifies the length in
608608+ octets of the RDATA field.
609609+610610+611611+612612+Mockapetris [Page 11]
613613+614614+RFC 1035 Domain Implementation and Specification November 1987
615615+616616+617617+RDATA a variable length string of octets that describes the
618618+ resource. The format of this information varies
619619+ according to the TYPE and CLASS of the resource record.
620620+621621+3.2.2. TYPE values
622622+623623+TYPE fields are used in resource records. Note that these types are a
624624+subset of QTYPEs.
625625+626626+TYPE value and meaning
627627+628628+A 1 a host address
629629+630630+NS 2 an authoritative name server
631631+632632+MD 3 a mail destination (Obsolete - use MX)
633633+634634+MF 4 a mail forwarder (Obsolete - use MX)
635635+636636+CNAME 5 the canonical name for an alias
637637+638638+SOA 6 marks the start of a zone of authority
639639+640640+MB 7 a mailbox domain name (EXPERIMENTAL)
641641+642642+MG 8 a mail group member (EXPERIMENTAL)
643643+644644+MR 9 a mail rename domain name (EXPERIMENTAL)
645645+646646+NULL 10 a null RR (EXPERIMENTAL)
647647+648648+WKS 11 a well known service description
649649+650650+PTR 12 a domain name pointer
651651+652652+HINFO 13 host information
653653+654654+MINFO 14 mailbox or mail list information
655655+656656+MX 15 mail exchange
657657+658658+TXT 16 text strings
659659+660660+3.2.3. QTYPE values
661661+662662+QTYPE fields appear in the question part of a query. QTYPES are a
663663+superset of TYPEs, hence all TYPEs are valid QTYPEs. In addition, the
664664+following QTYPEs are defined:
665665+666666+667667+668668+Mockapetris [Page 12]
669669+670670+RFC 1035 Domain Implementation and Specification November 1987
671671+672672+673673+AXFR 252 A request for a transfer of an entire zone
674674+675675+MAILB 253 A request for mailbox-related records (MB, MG or MR)
676676+677677+MAILA 254 A request for mail agent RRs (Obsolete - see MX)
678678+679679+* 255 A request for all records
680680+681681+3.2.4. CLASS values
682682+683683+CLASS fields appear in resource records. The following CLASS mnemonics
684684+and values are defined:
685685+686686+IN 1 the Internet
687687+688688+CS 2 the CSNET class (Obsolete - used only for examples in
689689+ some obsolete RFCs)
690690+691691+CH 3 the CHAOS class
692692+693693+HS 4 Hesiod [Dyer 87]
694694+695695+3.2.5. QCLASS values
696696+697697+QCLASS fields appear in the question section of a query. QCLASS values
698698+are a superset of CLASS values; every CLASS is a valid QCLASS. In
699699+addition to CLASS values, the following QCLASSes are defined:
700700+701701+* 255 any class
702702+703703+3.3. Standard RRs
704704+705705+The following RR definitions are expected to occur, at least
706706+potentially, in all classes. In particular, NS, SOA, CNAME, and PTR
707707+will be used in all classes, and have the same format in all classes.
708708+Because their RDATA format is known, all domain names in the RDATA
709709+section of these RRs may be compressed.
710710+711711+<domain-name> is a domain name represented as a series of labels, and
712712+terminated by a label with zero length. <character-string> is a single
713713+length octet followed by that number of characters. <character-string>
714714+is treated as binary information, and can be up to 256 characters in
715715+length (including the length octet).
716716+717717+718718+719719+720720+721721+722722+723723+724724+Mockapetris [Page 13]
725725+726726+RFC 1035 Domain Implementation and Specification November 1987
727727+728728+729729+3.3.1. CNAME RDATA format
730730+731731+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
732732+ / CNAME /
733733+ / /
734734+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
735735+736736+where:
737737+738738+CNAME A <domain-name> which specifies the canonical or primary
739739+ name for the owner. The owner name is an alias.
740740+741741+CNAME RRs cause no additional section processing, but name servers may
742742+choose to restart the query at the canonical name in certain cases. See
743743+the description of name server logic in [RFC-1034] for details.
744744+745745+3.3.2. HINFO RDATA format
746746+747747+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
748748+ / CPU /
749749+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
750750+ / OS /
751751+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
752752+753753+where:
754754+755755+CPU A <character-string> which specifies the CPU type.
756756+757757+OS A <character-string> which specifies the operating
758758+ system type.
759759+760760+Standard values for CPU and OS can be found in [RFC-1010].
761761+762762+HINFO records are used to acquire general information about a host. The
763763+main use is for protocols such as FTP that can use special procedures
764764+when talking between machines or operating systems of the same type.
765765+766766+3.3.3. MB RDATA format (EXPERIMENTAL)
767767+768768+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
769769+ / MADNAME /
770770+ / /
771771+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
772772+773773+where:
774774+775775+MADNAME A <domain-name> which specifies a host which has the
776776+ specified mailbox.
777777+778778+779779+780780+Mockapetris [Page 14]
781781+782782+RFC 1035 Domain Implementation and Specification November 1987
783783+784784+785785+MB records cause additional section processing which looks up an A type
786786+RRs corresponding to MADNAME.
787787+788788+3.3.4. MD RDATA format (Obsolete)
789789+790790+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
791791+ / MADNAME /
792792+ / /
793793+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
794794+795795+where:
796796+797797+MADNAME A <domain-name> which specifies a host which has a mail
798798+ agent for the domain which should be able to deliver
799799+ mail for the domain.
800800+801801+MD records cause additional section processing which looks up an A type
802802+record corresponding to MADNAME.
803803+804804+MD is obsolete. See the definition of MX and [RFC-974] for details of
805805+the new scheme. The recommended policy for dealing with MD RRs found in
806806+a master file is to reject them, or to convert them to MX RRs with a
807807+preference of 0.
808808+809809+3.3.5. MF RDATA format (Obsolete)
810810+811811+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
812812+ / MADNAME /
813813+ / /
814814+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
815815+816816+where:
817817+818818+MADNAME A <domain-name> which specifies a host which has a mail
819819+ agent for the domain which will accept mail for
820820+ forwarding to the domain.
821821+822822+MF records cause additional section processing which looks up an A type
823823+record corresponding to MADNAME.
824824+825825+MF is obsolete. See the definition of MX and [RFC-974] for details ofw
826826+the new scheme. The recommended policy for dealing with MD RRs found in
827827+a master file is to reject them, or to convert them to MX RRs with a
828828+preference of 10.
829829+830830+831831+832832+833833+834834+835835+836836+Mockapetris [Page 15]
837837+838838+RFC 1035 Domain Implementation and Specification November 1987
839839+840840+841841+3.3.6. MG RDATA format (EXPERIMENTAL)
842842+843843+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
844844+ / MGMNAME /
845845+ / /
846846+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
847847+848848+where:
849849+850850+MGMNAME A <domain-name> which specifies a mailbox which is a
851851+ member of the mail group specified by the domain name.
852852+853853+MG records cause no additional section processing.
854854+855855+3.3.7. MINFO RDATA format (EXPERIMENTAL)
856856+857857+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
858858+ / RMAILBX /
859859+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
860860+ / EMAILBX /
861861+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
862862+863863+where:
864864+865865+RMAILBX A <domain-name> which specifies a mailbox which is
866866+ responsible for the mailing list or mailbox. If this
867867+ domain name names the root, the owner of the MINFO RR is
868868+ responsible for itself. Note that many existing mailing
869869+ lists use a mailbox X-request for the RMAILBX field of
870870+ mailing list X, e.g., Msgroup-request for Msgroup. This
871871+ field provides a more general mechanism.
872872+873873+874874+EMAILBX A <domain-name> which specifies a mailbox which is to
875875+ receive error messages related to the mailing list or
876876+ mailbox specified by the owner of the MINFO RR (similar
877877+ to the ERRORS-TO: field which has been proposed). If
878878+ this domain name names the root, errors should be
879879+ returned to the sender of the message.
880880+881881+MINFO records cause no additional section processing. Although these
882882+records can be associated with a simple mailbox, they are usually used
883883+with a mailing list.
884884+885885+886886+887887+888888+889889+890890+891891+892892+Mockapetris [Page 16]
893893+894894+RFC 1035 Domain Implementation and Specification November 1987
895895+896896+897897+3.3.8. MR RDATA format (EXPERIMENTAL)
898898+899899+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
900900+ / NEWNAME /
901901+ / /
902902+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
903903+904904+where:
905905+906906+NEWNAME A <domain-name> which specifies a mailbox which is the
907907+ proper rename of the specified mailbox.
908908+909909+MR records cause no additional section processing. The main use for MR
910910+is as a forwarding entry for a user who has moved to a different
911911+mailbox.
912912+913913+3.3.9. MX RDATA format
914914+915915+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
916916+ | PREFERENCE |
917917+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
918918+ / EXCHANGE /
919919+ / /
920920+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
921921+922922+where:
923923+924924+PREFERENCE A 16 bit integer which specifies the preference given to
925925+ this RR among others at the same owner. Lower values
926926+ are preferred.
927927+928928+EXCHANGE A <domain-name> which specifies a host willing to act as
929929+ a mail exchange for the owner name.
930930+931931+MX records cause type A additional section processing for the host
932932+specified by EXCHANGE. The use of MX RRs is explained in detail in
933933+[RFC-974].
934934+935935+3.3.10. NULL RDATA format (EXPERIMENTAL)
936936+937937+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
938938+ / <anything> /
939939+ / /
940940+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
941941+942942+Anything at all may be in the RDATA field so long as it is 65535 octets
943943+or less.
944944+945945+946946+947947+948948+Mockapetris [Page 17]
949949+950950+RFC 1035 Domain Implementation and Specification November 1987
951951+952952+953953+NULL records cause no additional section processing. NULL RRs are not
954954+allowed in master files. NULLs are used as placeholders in some
955955+experimental extensions of the DNS.
956956+957957+3.3.11. NS RDATA format
958958+959959+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
960960+ / NSDNAME /
961961+ / /
962962+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
963963+964964+where:
965965+966966+NSDNAME A <domain-name> which specifies a host which should be
967967+ authoritative for the specified class and domain.
968968+969969+NS records cause both the usual additional section processing to locate
970970+a type A record, and, when used in a referral, a special search of the
971971+zone in which they reside for glue information.
972972+973973+The NS RR states that the named host should be expected to have a zone
974974+starting at owner name of the specified class. Note that the class may
975975+not indicate the protocol family which should be used to communicate
976976+with the host, although it is typically a strong hint. For example,
977977+hosts which are name servers for either Internet (IN) or Hesiod (HS)
978978+class information are normally queried using IN class protocols.
979979+980980+3.3.12. PTR RDATA format
981981+982982+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
983983+ / PTRDNAME /
984984+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
985985+986986+where:
987987+988988+PTRDNAME A <domain-name> which points to some location in the
989989+ domain name space.
990990+991991+PTR records cause no additional section processing. These RRs are used
992992+in special domains to point to some other location in the domain space.
993993+These records are simple data, and don't imply any special processing
994994+similar to that performed by CNAME, which identifies aliases. See the
995995+description of the IN-ADDR.ARPA domain for an example.
996996+997997+998998+999999+10001000+10011001+10021002+10031003+10041004+Mockapetris [Page 18]
10051005+10061006+RFC 1035 Domain Implementation and Specification November 1987
10071007+10081008+10091009+3.3.13. SOA RDATA format
10101010+10111011+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
10121012+ / MNAME /
10131013+ / /
10141014+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
10151015+ / RNAME /
10161016+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
10171017+ | SERIAL |
10181018+ | |
10191019+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
10201020+ | REFRESH |
10211021+ | |
10221022+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
10231023+ | RETRY |
10241024+ | |
10251025+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
10261026+ | EXPIRE |
10271027+ | |
10281028+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
10291029+ | MINIMUM |
10301030+ | |
10311031+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
10321032+10331033+where:
10341034+10351035+MNAME The <domain-name> of the name server that was the
10361036+ original or primary source of data for this zone.
10371037+10381038+RNAME A <domain-name> which specifies the mailbox of the
10391039+ person responsible for this zone.
10401040+10411041+SERIAL The unsigned 32 bit version number of the original copy
10421042+ of the zone. Zone transfers preserve this value. This
10431043+ value wraps and should be compared using sequence space
10441044+ arithmetic.
10451045+10461046+REFRESH A 32 bit time interval before the zone should be
10471047+ refreshed.
10481048+10491049+RETRY A 32 bit time interval that should elapse before a
10501050+ failed refresh should be retried.
10511051+10521052+EXPIRE A 32 bit time value that specifies the upper limit on
10531053+ the time interval that can elapse before the zone is no
10541054+ longer authoritative.
10551055+10561056+10571057+10581058+10591059+10601060+Mockapetris [Page 19]
10611061+10621062+RFC 1035 Domain Implementation and Specification November 1987
10631063+10641064+10651065+MINIMUM The unsigned 32 bit minimum TTL field that should be
10661066+ exported with any RR from this zone.
10671067+10681068+SOA records cause no additional section processing.
10691069+10701070+All times are in units of seconds.
10711071+10721072+Most of these fields are pertinent only for name server maintenance
10731073+operations. However, MINIMUM is used in all query operations that
10741074+retrieve RRs from a zone. Whenever a RR is sent in a response to a
10751075+query, the TTL field is set to the maximum of the TTL field from the RR
10761076+and the MINIMUM field in the appropriate SOA. Thus MINIMUM is a lower
10771077+bound on the TTL field for all RRs in a zone. Note that this use of
10781078+MINIMUM should occur when the RRs are copied into the response and not
10791079+when the zone is loaded from a master file or via a zone transfer. The
10801080+reason for this provison is to allow future dynamic update facilities to
10811081+change the SOA RR with known semantics.
10821082+10831083+10841084+3.3.14. TXT RDATA format
10851085+10861086+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
10871087+ / TXT-DATA /
10881088+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
10891089+10901090+where:
10911091+10921092+TXT-DATA One or more <character-string>s.
10931093+10941094+TXT RRs are used to hold descriptive text. The semantics of the text
10951095+depends on the domain where it is found.
10961096+10971097+3.4. Internet specific RRs
10981098+10991099+3.4.1. A RDATA format
11001100+11011101+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
11021102+ | ADDRESS |
11031103+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
11041104+11051105+where:
11061106+11071107+ADDRESS A 32 bit Internet address.
11081108+11091109+Hosts that have multiple Internet addresses will have multiple A
11101110+records.
11111111+11121112+11131113+11141114+11151115+11161116+Mockapetris [Page 20]
11171117+11181118+RFC 1035 Domain Implementation and Specification November 1987
11191119+11201120+11211121+A records cause no additional section processing. The RDATA section of
11221122+an A line in a master file is an Internet address expressed as four
11231123+decimal numbers separated by dots without any imbedded spaces (e.g.,
11241124+"10.2.0.52" or "192.0.5.6").
11251125+11261126+3.4.2. WKS RDATA format
11271127+11281128+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
11291129+ | ADDRESS |
11301130+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
11311131+ | PROTOCOL | |
11321132+ +--+--+--+--+--+--+--+--+ |
11331133+ | |
11341134+ / <BIT MAP> /
11351135+ / /
11361136+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
11371137+11381138+where:
11391139+11401140+ADDRESS An 32 bit Internet address
11411141+11421142+PROTOCOL An 8 bit IP protocol number
11431143+11441144+<BIT MAP> A variable length bit map. The bit map must be a
11451145+ multiple of 8 bits long.
11461146+11471147+The WKS record is used to describe the well known services supported by
11481148+a particular protocol on a particular internet address. The PROTOCOL
11491149+field specifies an IP protocol number, and the bit map has one bit per
11501150+port of the specified protocol. The first bit corresponds to port 0,
11511151+the second to port 1, etc. If the bit map does not include a bit for a
11521152+protocol of interest, that bit is assumed zero. The appropriate values
11531153+and mnemonics for ports and protocols are specified in [RFC-1010].
11541154+11551155+For example, if PROTOCOL=TCP (6), the 26th bit corresponds to TCP port
11561156+25 (SMTP). If this bit is set, a SMTP server should be listening on TCP
11571157+port 25; if zero, SMTP service is not supported on the specified
11581158+address.
11591159+11601160+The purpose of WKS RRs is to provide availability information for
11611161+servers for TCP and UDP. If a server supports both TCP and UDP, or has
11621162+multiple Internet addresses, then multiple WKS RRs are used.
11631163+11641164+WKS RRs cause no additional section processing.
11651165+11661166+In master files, both ports and protocols are expressed using mnemonics
11671167+or decimal numbers.
11681168+11691169+11701170+11711171+11721172+Mockapetris [Page 21]
11731173+11741174+RFC 1035 Domain Implementation and Specification November 1987
11751175+11761176+11771177+3.5. IN-ADDR.ARPA domain
11781178+11791179+The Internet uses a special domain to support gateway location and
11801180+Internet address to host mapping. Other classes may employ a similar
11811181+strategy in other domains. The intent of this domain is to provide a
11821182+guaranteed method to perform host address to host name mapping, and to
11831183+facilitate queries to locate all gateways on a particular network in the
11841184+Internet.
11851185+11861186+Note that both of these services are similar to functions that could be
11871187+performed by inverse queries; the difference is that this part of the
11881188+domain name space is structured according to address, and hence can
11891189+guarantee that the appropriate data can be located without an exhaustive
11901190+search of the domain space.
11911191+11921192+The domain begins at IN-ADDR.ARPA and has a substructure which follows
11931193+the Internet addressing structure.
11941194+11951195+Domain names in the IN-ADDR.ARPA domain are defined to have up to four
11961196+labels in addition to the IN-ADDR.ARPA suffix. Each label represents
11971197+one octet of an Internet address, and is expressed as a character string
11981198+for a decimal value in the range 0-255 (with leading zeros omitted
11991199+except in the case of a zero octet which is represented by a single
12001200+zero).
12011201+12021202+Host addresses are represented by domain names that have all four labels
12031203+specified. Thus data for Internet address 10.2.0.52 is located at
12041204+domain name 52.0.2.10.IN-ADDR.ARPA. The reversal, though awkward to
12051205+read, allows zones to be delegated which are exactly one network of
12061206+address space. For example, 10.IN-ADDR.ARPA can be a zone containing
12071207+data for the ARPANET, while 26.IN-ADDR.ARPA can be a separate zone for
12081208+MILNET. Address nodes are used to hold pointers to primary host names
12091209+in the normal domain space.
12101210+12111211+Network numbers correspond to some non-terminal nodes at various depths
12121212+in the IN-ADDR.ARPA domain, since Internet network numbers are either 1,
12131213+2, or 3 octets. Network nodes are used to hold pointers to the primary
12141214+host names of gateways attached to that network. Since a gateway is, by
12151215+definition, on more than one network, it will typically have two or more
12161216+network nodes which point at it. Gateways will also have host level
12171217+pointers at their fully qualified addresses.
12181218+12191219+Both the gateway pointers at network nodes and the normal host pointers
12201220+at full address nodes use the PTR RR to point back to the primary domain
12211221+names of the corresponding hosts.
12221222+12231223+For example, the IN-ADDR.ARPA domain will contain information about the
12241224+ISI gateway between net 10 and 26, an MIT gateway from net 10 to MIT's
12251225+12261226+12271227+12281228+Mockapetris [Page 22]
12291229+12301230+RFC 1035 Domain Implementation and Specification November 1987
12311231+12321232+12331233+net 18, and hosts A.ISI.EDU and MULTICS.MIT.EDU. Assuming that ISI
12341234+gateway has addresses 10.2.0.22 and 26.0.0.103, and a name MILNET-
12351235+GW.ISI.EDU, and the MIT gateway has addresses 10.0.0.77 and 18.10.0.4
12361236+and a name GW.LCS.MIT.EDU, the domain database would contain:
12371237+12381238+ 10.IN-ADDR.ARPA. PTR MILNET-GW.ISI.EDU.
12391239+ 10.IN-ADDR.ARPA. PTR GW.LCS.MIT.EDU.
12401240+ 18.IN-ADDR.ARPA. PTR GW.LCS.MIT.EDU.
12411241+ 26.IN-ADDR.ARPA. PTR MILNET-GW.ISI.EDU.
12421242+ 22.0.2.10.IN-ADDR.ARPA. PTR MILNET-GW.ISI.EDU.
12431243+ 103.0.0.26.IN-ADDR.ARPA. PTR MILNET-GW.ISI.EDU.
12441244+ 77.0.0.10.IN-ADDR.ARPA. PTR GW.LCS.MIT.EDU.
12451245+ 4.0.10.18.IN-ADDR.ARPA. PTR GW.LCS.MIT.EDU.
12461246+ 103.0.3.26.IN-ADDR.ARPA. PTR A.ISI.EDU.
12471247+ 6.0.0.10.IN-ADDR.ARPA. PTR MULTICS.MIT.EDU.
12481248+12491249+Thus a program which wanted to locate gateways on net 10 would originate
12501250+a query of the form QTYPE=PTR, QCLASS=IN, QNAME=10.IN-ADDR.ARPA. It
12511251+would receive two RRs in response:
12521252+12531253+ 10.IN-ADDR.ARPA. PTR MILNET-GW.ISI.EDU.
12541254+ 10.IN-ADDR.ARPA. PTR GW.LCS.MIT.EDU.
12551255+12561256+The program could then originate QTYPE=A, QCLASS=IN queries for MILNET-
12571257+GW.ISI.EDU. and GW.LCS.MIT.EDU. to discover the Internet addresses of
12581258+these gateways.
12591259+12601260+A resolver which wanted to find the host name corresponding to Internet
12611261+host address 10.0.0.6 would pursue a query of the form QTYPE=PTR,
12621262+QCLASS=IN, QNAME=6.0.0.10.IN-ADDR.ARPA, and would receive:
12631263+12641264+ 6.0.0.10.IN-ADDR.ARPA. PTR MULTICS.MIT.EDU.
12651265+12661266+Several cautions apply to the use of these services:
12671267+ - Since the IN-ADDR.ARPA special domain and the normal domain
12681268+ for a particular host or gateway will be in different zones,
12691269+ the possibility exists that that the data may be inconsistent.
12701270+12711271+ - Gateways will often have two names in separate domains, only
12721272+ one of which can be primary.
12731273+12741274+ - Systems that use the domain database to initialize their
12751275+ routing tables must start with enough gateway information to
12761276+ guarantee that they can access the appropriate name server.
12771277+12781278+ - The gateway data only reflects the existence of a gateway in a
12791279+ manner equivalent to the current HOSTS.TXT file. It doesn't
12801280+ replace the dynamic availability information from GGP or EGP.
12811281+12821282+12831283+12841284+Mockapetris [Page 23]
12851285+12861286+RFC 1035 Domain Implementation and Specification November 1987
12871287+12881288+12891289+3.6. Defining new types, classes, and special namespaces
12901290+12911291+The previously defined types and classes are the ones in use as of the
12921292+date of this memo. New definitions should be expected. This section
12931293+makes some recommendations to designers considering additions to the
12941294+existing facilities. The mailing list NAMEDROPPERS@SRI-NIC.ARPA is the
12951295+forum where general discussion of design issues takes place.
12961296+12971297+In general, a new type is appropriate when new information is to be
12981298+added to the database about an existing object, or we need new data
12991299+formats for some totally new object. Designers should attempt to define
13001300+types and their RDATA formats that are generally applicable to all
13011301+classes, and which avoid duplication of information. New classes are
13021302+appropriate when the DNS is to be used for a new protocol, etc which
13031303+requires new class-specific data formats, or when a copy of the existing
13041304+name space is desired, but a separate management domain is necessary.
13051305+13061306+New types and classes need mnemonics for master files; the format of the
13071307+master files requires that the mnemonics for type and class be disjoint.
13081308+13091309+TYPE and CLASS values must be a proper subset of QTYPEs and QCLASSes
13101310+respectively.
13111311+13121312+The present system uses multiple RRs to represent multiple values of a
13131313+type rather than storing multiple values in the RDATA section of a
13141314+single RR. This is less efficient for most applications, but does keep
13151315+RRs shorter. The multiple RRs assumption is incorporated in some
13161316+experimental work on dynamic update methods.
13171317+13181318+The present system attempts to minimize the duplication of data in the
13191319+database in order to insure consistency. Thus, in order to find the
13201320+address of the host for a mail exchange, you map the mail domain name to
13211321+a host name, then the host name to addresses, rather than a direct
13221322+mapping to host address. This approach is preferred because it avoids
13231323+the opportunity for inconsistency.
13241324+13251325+In defining a new type of data, multiple RR types should not be used to
13261326+create an ordering between entries or express different formats for
13271327+equivalent bindings, instead this information should be carried in the
13281328+body of the RR and a single type used. This policy avoids problems with
13291329+caching multiple types and defining QTYPEs to match multiple types.
13301330+13311331+For example, the original form of mail exchange binding used two RR
13321332+types one to represent a "closer" exchange (MD) and one to represent a
13331333+"less close" exchange (MF). The difficulty is that the presence of one
13341334+RR type in a cache doesn't convey any information about the other
13351335+because the query which acquired the cached information might have used
13361336+a QTYPE of MF, MD, or MAILA (which matched both). The redesigned
13371337+13381338+13391339+13401340+Mockapetris [Page 24]
13411341+13421342+RFC 1035 Domain Implementation and Specification November 1987
13431343+13441344+13451345+service used a single type (MX) with a "preference" value in the RDATA
13461346+section which can order different RRs. However, if any MX RRs are found
13471347+in the cache, then all should be there.
13481348+13491349+4. MESSAGES
13501350+13511351+4.1. Format
13521352+13531353+All communications inside of the domain protocol are carried in a single
13541354+format called a message. The top level format of message is divided
13551355+into 5 sections (some of which are empty in certain cases) shown below:
13561356+13571357+ +---------------------+
13581358+ | Header |
13591359+ +---------------------+
13601360+ | Question | the question for the name server
13611361+ +---------------------+
13621362+ | Answer | RRs answering the question
13631363+ +---------------------+
13641364+ | Authority | RRs pointing toward an authority
13651365+ +---------------------+
13661366+ | Additional | RRs holding additional information
13671367+ +---------------------+
13681368+13691369+The header section is always present. The header includes fields that
13701370+specify which of the remaining sections are present, and also specify
13711371+whether the message is a query or a response, a standard query or some
13721372+other opcode, etc.
13731373+13741374+The names of the sections after the header are derived from their use in
13751375+standard queries. The question section contains fields that describe a
13761376+question to a name server. These fields are a query type (QTYPE), a
13771377+query class (QCLASS), and a query domain name (QNAME). The last three
13781378+sections have the same format: a possibly empty list of concatenated
13791379+resource records (RRs). The answer section contains RRs that answer the
13801380+question; the authority section contains RRs that point toward an
13811381+authoritative name server; the additional records section contains RRs
13821382+which relate to the query, but are not strictly answers for the
13831383+question.
13841384+13851385+13861386+13871387+13881388+13891389+13901390+13911391+13921392+13931393+13941394+13951395+13961396+Mockapetris [Page 25]
13971397+13981398+RFC 1035 Domain Implementation and Specification November 1987
13991399+14001400+14011401+4.1.1. Header section format
14021402+14031403+The header contains the following fields:
14041404+14051405+ 1 1 1 1 1 1
14061406+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
14071407+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
14081408+ | ID |
14091409+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
14101410+ |QR| Opcode |AA|TC|RD|RA| Z | RCODE |
14111411+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
14121412+ | QDCOUNT |
14131413+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
14141414+ | ANCOUNT |
14151415+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
14161416+ | NSCOUNT |
14171417+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
14181418+ | ARCOUNT |
14191419+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
14201420+14211421+where:
14221422+14231423+ID A 16 bit identifier assigned by the program that
14241424+ generates any kind of query. This identifier is copied
14251425+ the corresponding reply and can be used by the requester
14261426+ to match up replies to outstanding queries.
14271427+14281428+QR A one bit field that specifies whether this message is a
14291429+ query (0), or a response (1).
14301430+14311431+OPCODE A four bit field that specifies kind of query in this
14321432+ message. This value is set by the originator of a query
14331433+ and copied into the response. The values are:
14341434+14351435+ 0 a standard query (QUERY)
14361436+14371437+ 1 an inverse query (IQUERY)
14381438+14391439+ 2 a server status request (STATUS)
14401440+14411441+ 3-15 reserved for future use
14421442+14431443+AA Authoritative Answer - this bit is valid in responses,
14441444+ and specifies that the responding name server is an
14451445+ authority for the domain name in question section.
14461446+14471447+ Note that the contents of the answer section may have
14481448+ multiple owner names because of aliases. The AA bit
14491449+14501450+14511451+14521452+Mockapetris [Page 26]
14531453+14541454+RFC 1035 Domain Implementation and Specification November 1987
14551455+14561456+14571457+ corresponds to the name which matches the query name, or
14581458+ the first owner name in the answer section.
14591459+14601460+TC TrunCation - specifies that this message was truncated
14611461+ due to length greater than that permitted on the
14621462+ transmission channel.
14631463+14641464+RD Recursion Desired - this bit may be set in a query and
14651465+ is copied into the response. If RD is set, it directs
14661466+ the name server to pursue the query recursively.
14671467+ Recursive query support is optional.
14681468+14691469+RA Recursion Available - this be is set or cleared in a
14701470+ response, and denotes whether recursive query support is
14711471+ available in the name server.
14721472+14731473+Z Reserved for future use. Must be zero in all queries
14741474+ and responses.
14751475+14761476+RCODE Response code - this 4 bit field is set as part of
14771477+ responses. The values have the following
14781478+ interpretation:
14791479+14801480+ 0 No error condition
14811481+14821482+ 1 Format error - The name server was
14831483+ unable to interpret the query.
14841484+14851485+ 2 Server failure - The name server was
14861486+ unable to process this query due to a
14871487+ problem with the name server.
14881488+14891489+ 3 Name Error - Meaningful only for
14901490+ responses from an authoritative name
14911491+ server, this code signifies that the
14921492+ domain name referenced in the query does
14931493+ not exist.
14941494+14951495+ 4 Not Implemented - The name server does
14961496+ not support the requested kind of query.
14971497+14981498+ 5 Refused - The name server refuses to
14991499+ perform the specified operation for
15001500+ policy reasons. For example, a name
15011501+ server may not wish to provide the
15021502+ information to the particular requester,
15031503+ or a name server may not wish to perform
15041504+ a particular operation (e.g., zone
15051505+15061506+15071507+15081508+Mockapetris [Page 27]
15091509+15101510+RFC 1035 Domain Implementation and Specification November 1987
15111511+15121512+15131513+ transfer) for particular data.
15141514+15151515+ 6-15 Reserved for future use.
15161516+15171517+QDCOUNT an unsigned 16 bit integer specifying the number of
15181518+ entries in the question section.
15191519+15201520+ANCOUNT an unsigned 16 bit integer specifying the number of
15211521+ resource records in the answer section.
15221522+15231523+NSCOUNT an unsigned 16 bit integer specifying the number of name
15241524+ server resource records in the authority records
15251525+ section.
15261526+15271527+ARCOUNT an unsigned 16 bit integer specifying the number of
15281528+ resource records in the additional records section.
15291529+15301530+4.1.2. Question section format
15311531+15321532+The question section is used to carry the "question" in most queries,
15331533+i.e., the parameters that define what is being asked. The section
15341534+contains QDCOUNT (usually 1) entries, each of the following format:
15351535+15361536+ 1 1 1 1 1 1
15371537+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
15381538+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
15391539+ | |
15401540+ / QNAME /
15411541+ / /
15421542+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
15431543+ | QTYPE |
15441544+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
15451545+ | QCLASS |
15461546+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
15471547+15481548+where:
15491549+15501550+QNAME a domain name represented as a sequence of labels, where
15511551+ each label consists of a length octet followed by that
15521552+ number of octets. The domain name terminates with the
15531553+ zero length octet for the null label of the root. Note
15541554+ that this field may be an odd number of octets; no
15551555+ padding is used.
15561556+15571557+QTYPE a two octet code which specifies the type of the query.
15581558+ The values for this field include all codes valid for a
15591559+ TYPE field, together with some more general codes which
15601560+ can match more than one type of RR.
15611561+15621562+15631563+15641564+Mockapetris [Page 28]
15651565+15661566+RFC 1035 Domain Implementation and Specification November 1987
15671567+15681568+15691569+QCLASS a two octet code that specifies the class of the query.
15701570+ For example, the QCLASS field is IN for the Internet.
15711571+15721572+4.1.3. Resource record format
15731573+15741574+The answer, authority, and additional sections all share the same
15751575+format: a variable number of resource records, where the number of
15761576+records is specified in the corresponding count field in the header.
15771577+Each resource record has the following format:
15781578+ 1 1 1 1 1 1
15791579+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
15801580+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
15811581+ | |
15821582+ / /
15831583+ / NAME /
15841584+ | |
15851585+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
15861586+ | TYPE |
15871587+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
15881588+ | CLASS |
15891589+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
15901590+ | TTL |
15911591+ | |
15921592+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
15931593+ | RDLENGTH |
15941594+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--|
15951595+ / RDATA /
15961596+ / /
15971597+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
15981598+15991599+where:
16001600+16011601+NAME a domain name to which this resource record pertains.
16021602+16031603+TYPE two octets containing one of the RR type codes. This
16041604+ field specifies the meaning of the data in the RDATA
16051605+ field.
16061606+16071607+CLASS two octets which specify the class of the data in the
16081608+ RDATA field.
16091609+16101610+TTL a 32 bit unsigned integer that specifies the time
16111611+ interval (in seconds) that the resource record may be
16121612+ cached before it should be discarded. Zero values are
16131613+ interpreted to mean that the RR can only be used for the
16141614+ transaction in progress, and should not be cached.
16151615+16161616+16171617+16181618+16191619+16201620+Mockapetris [Page 29]
16211621+16221622+RFC 1035 Domain Implementation and Specification November 1987
16231623+16241624+16251625+RDLENGTH an unsigned 16 bit integer that specifies the length in
16261626+ octets of the RDATA field.
16271627+16281628+RDATA a variable length string of octets that describes the
16291629+ resource. The format of this information varies
16301630+ according to the TYPE and CLASS of the resource record.
16311631+ For example, the if the TYPE is A and the CLASS is IN,
16321632+ the RDATA field is a 4 octet ARPA Internet address.
16331633+16341634+4.1.4. Message compression
16351635+16361636+In order to reduce the size of messages, the domain system utilizes a
16371637+compression scheme which eliminates the repetition of domain names in a
16381638+message. In this scheme, an entire domain name or a list of labels at
16391639+the end of a domain name is replaced with a pointer to a prior occurance
16401640+of the same name.
16411641+16421642+The pointer takes the form of a two octet sequence:
16431643+16441644+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
16451645+ | 1 1| OFFSET |
16461646+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
16471647+16481648+The first two bits are ones. This allows a pointer to be distinguished
16491649+from a label, since the label must begin with two zero bits because
16501650+labels are restricted to 63 octets or less. (The 10 and 01 combinations
16511651+are reserved for future use.) The OFFSET field specifies an offset from
16521652+the start of the message (i.e., the first octet of the ID field in the
16531653+domain header). A zero offset specifies the first byte of the ID field,
16541654+etc.
16551655+16561656+The compression scheme allows a domain name in a message to be
16571657+represented as either:
16581658+16591659+ - a sequence of labels ending in a zero octet
16601660+16611661+ - a pointer
16621662+16631663+ - a sequence of labels ending with a pointer
16641664+16651665+Pointers can only be used for occurances of a domain name where the
16661666+format is not class specific. If this were not the case, a name server
16671667+or resolver would be required to know the format of all RRs it handled.
16681668+As yet, there are no such cases, but they may occur in future RDATA
16691669+formats.
16701670+16711671+If a domain name is contained in a part of the message subject to a
16721672+length field (such as the RDATA section of an RR), and compression is
16731673+16741674+16751675+16761676+Mockapetris [Page 30]
16771677+16781678+RFC 1035 Domain Implementation and Specification November 1987
16791679+16801680+16811681+used, the length of the compressed name is used in the length
16821682+calculation, rather than the length of the expanded name.
16831683+16841684+Programs are free to avoid using pointers in messages they generate,
16851685+although this will reduce datagram capacity, and may cause truncation.
16861686+However all programs are required to understand arriving messages that
16871687+contain pointers.
16881688+16891689+For example, a datagram might need to use the domain names F.ISI.ARPA,
16901690+FOO.F.ISI.ARPA, ARPA, and the root. Ignoring the other fields of the
16911691+message, these domain names might be represented as:
16921692+16931693+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
16941694+ 20 | 1 | F |
16951695+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
16961696+ 22 | 3 | I |
16971697+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
16981698+ 24 | S | I |
16991699+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
17001700+ 26 | 4 | A |
17011701+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
17021702+ 28 | R | P |
17031703+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
17041704+ 30 | A | 0 |
17051705+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
17061706+17071707+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
17081708+ 40 | 3 | F |
17091709+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
17101710+ 42 | O | O |
17111711+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
17121712+ 44 | 1 1| 20 |
17131713+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
17141714+17151715+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
17161716+ 64 | 1 1| 26 |
17171717+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
17181718+17191719+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
17201720+ 92 | 0 | |
17211721+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
17221722+17231723+The domain name for F.ISI.ARPA is shown at offset 20. The domain name
17241724+FOO.F.ISI.ARPA is shown at offset 40; this definition uses a pointer to
17251725+concatenate a label for FOO to the previously defined F.ISI.ARPA. The
17261726+domain name ARPA is defined at offset 64 using a pointer to the ARPA
17271727+component of the name F.ISI.ARPA at 20; note that this pointer relies on
17281728+ARPA being the last label in the string at 20. The root domain name is
17291729+17301730+17311731+17321732+Mockapetris [Page 31]
17331733+17341734+RFC 1035 Domain Implementation and Specification November 1987
17351735+17361736+17371737+defined by a single octet of zeros at 92; the root domain name has no
17381738+labels.
17391739+17401740+4.2. Transport
17411741+17421742+The DNS assumes that messages will be transmitted as datagrams or in a
17431743+byte stream carried by a virtual circuit. While virtual circuits can be
17441744+used for any DNS activity, datagrams are preferred for queries due to
17451745+their lower overhead and better performance. Zone refresh activities
17461746+must use virtual circuits because of the need for reliable transfer.
17471747+17481748+The Internet supports name server access using TCP [RFC-793] on server
17491749+port 53 (decimal) as well as datagram access using UDP [RFC-768] on UDP
17501750+port 53 (decimal).
17511751+17521752+4.2.1. UDP usage
17531753+17541754+Messages sent using UDP user server port 53 (decimal).
17551755+17561756+Messages carried by UDP are restricted to 512 bytes (not counting the IP
17571757+or UDP headers). Longer messages are truncated and the TC bit is set in
17581758+the header.
17591759+17601760+UDP is not acceptable for zone transfers, but is the recommended method
17611761+for standard queries in the Internet. Queries sent using UDP may be
17621762+lost, and hence a retransmission strategy is required. Queries or their
17631763+responses may be reordered by the network, or by processing in name
17641764+servers, so resolvers should not depend on them being returned in order.
17651765+17661766+The optimal UDP retransmission policy will vary with performance of the
17671767+Internet and the needs of the client, but the following are recommended:
17681768+17691769+ - The client should try other servers and server addresses
17701770+ before repeating a query to a specific address of a server.
17711771+17721772+ - The retransmission interval should be based on prior
17731773+ statistics if possible. Too aggressive retransmission can
17741774+ easily slow responses for the community at large. Depending
17751775+ on how well connected the client is to its expected servers,
17761776+ the minimum retransmission interval should be 2-5 seconds.
17771777+17781778+More suggestions on server selection and retransmission policy can be
17791779+found in the resolver section of this memo.
17801780+17811781+4.2.2. TCP usage
17821782+17831783+Messages sent over TCP connections use server port 53 (decimal). The
17841784+message is prefixed with a two byte length field which gives the message
17851785+17861786+17871787+17881788+Mockapetris [Page 32]
17891789+17901790+RFC 1035 Domain Implementation and Specification November 1987
17911791+17921792+17931793+length, excluding the two byte length field. This length field allows
17941794+the low-level processing to assemble a complete message before beginning
17951795+to parse it.
17961796+17971797+Several connection management policies are recommended:
17981798+17991799+ - The server should not block other activities waiting for TCP
18001800+ data.
18011801+18021802+ - The server should support multiple connections.
18031803+18041804+ - The server should assume that the client will initiate
18051805+ connection closing, and should delay closing its end of the
18061806+ connection until all outstanding client requests have been
18071807+ satisfied.
18081808+18091809+ - If the server needs to close a dormant connection to reclaim
18101810+ resources, it should wait until the connection has been idle
18111811+ for a period on the order of two minutes. In particular, the
18121812+ server should allow the SOA and AXFR request sequence (which
18131813+ begins a refresh operation) to be made on a single connection.
18141814+ Since the server would be unable to answer queries anyway, a
18151815+ unilateral close or reset may be used instead of a graceful
18161816+ close.
18171817+18181818+5. MASTER FILES
18191819+18201820+Master files are text files that contain RRs in text form. Since the
18211821+contents of a zone can be expressed in the form of a list of RRs a
18221822+master file is most often used to define a zone, though it can be used
18231823+to list a cache's contents. Hence, this section first discusses the
18241824+format of RRs in a master file, and then the special considerations when
18251825+a master file is used to create a zone in some name server.
18261826+18271827+5.1. Format
18281828+18291829+The format of these files is a sequence of entries. Entries are
18301830+predominantly line-oriented, though parentheses can be used to continue
18311831+a list of items across a line boundary, and text literals can contain
18321832+CRLF within the text. Any combination of tabs and spaces act as a
18331833+delimiter between the separate items that make up an entry. The end of
18341834+any line in the master file can end with a comment. The comment starts
18351835+with a ";" (semicolon).
18361836+18371837+The following entries are defined:
18381838+18391839+ <blank>[<comment>]
18401840+18411841+18421842+18431843+18441844+Mockapetris [Page 33]
18451845+18461846+RFC 1035 Domain Implementation and Specification November 1987
18471847+18481848+18491849+ $ORIGIN <domain-name> [<comment>]
18501850+18511851+ $INCLUDE <file-name> [<domain-name>] [<comment>]
18521852+18531853+ <domain-name><rr> [<comment>]
18541854+18551855+ <blank><rr> [<comment>]
18561856+18571857+Blank lines, with or without comments, are allowed anywhere in the file.
18581858+18591859+Two control entries are defined: $ORIGIN and $INCLUDE. $ORIGIN is
18601860+followed by a domain name, and resets the current origin for relative
18611861+domain names to the stated name. $INCLUDE inserts the named file into
18621862+the current file, and may optionally specify a domain name that sets the
18631863+relative domain name origin for the included file. $INCLUDE may also
18641864+have a comment. Note that a $INCLUDE entry never changes the relative
18651865+origin of the parent file, regardless of changes to the relative origin
18661866+made within the included file.
18671867+18681868+The last two forms represent RRs. If an entry for an RR begins with a
18691869+blank, then the RR is assumed to be owned by the last stated owner. If
18701870+an RR entry begins with a <domain-name>, then the owner name is reset.
18711871+18721872+<rr> contents take one of the following forms:
18731873+18741874+ [<TTL>] [<class>] <type> <RDATA>
18751875+18761876+ [<class>] [<TTL>] <type> <RDATA>
18771877+18781878+The RR begins with optional TTL and class fields, followed by a type and
18791879+RDATA field appropriate to the type and class. Class and type use the
18801880+standard mnemonics, TTL is a decimal integer. Omitted class and TTL
18811881+values are default to the last explicitly stated values. Since type and
18821882+class mnemonics are disjoint, the parse is unique. (Note that this
18831883+order is different from the order used in examples and the order used in
18841884+the actual RRs; the given order allows easier parsing and defaulting.)
18851885+18861886+<domain-name>s make up a large share of the data in the master file.
18871887+The labels in the domain name are expressed as character strings and
18881888+separated by dots. Quoting conventions allow arbitrary characters to be
18891889+stored in domain names. Domain names that end in a dot are called
18901890+absolute, and are taken as complete. Domain names which do not end in a
18911891+dot are called relative; the actual domain name is the concatenation of
18921892+the relative part with an origin specified in a $ORIGIN, $INCLUDE, or as
18931893+an argument to the master file loading routine. A relative name is an
18941894+error when no origin is available.
18951895+18961896+18971897+18981898+18991899+19001900+Mockapetris [Page 34]
19011901+19021902+RFC 1035 Domain Implementation and Specification November 1987
19031903+19041904+19051905+<character-string> is expressed in one or two ways: as a contiguous set
19061906+of characters without interior spaces, or as a string beginning with a "
19071907+and ending with a ". Inside a " delimited string any character can
19081908+occur, except for a " itself, which must be quoted using \ (back slash).
19091909+19101910+Because these files are text files several special encodings are
19111911+necessary to allow arbitrary data to be loaded. In particular:
19121912+19131913+ of the root.
19141914+19151915+@ A free standing @ is used to denote the current origin.
19161916+19171917+\X where X is any character other than a digit (0-9), is
19181918+ used to quote that character so that its special meaning
19191919+ does not apply. For example, "\." can be used to place
19201920+ a dot character in a label.
19211921+19221922+\DDD where each D is a digit is the octet corresponding to
19231923+ the decimal number described by DDD. The resulting
19241924+ octet is assumed to be text and is not checked for
19251925+ special meaning.
19261926+19271927+( ) Parentheses are used to group data that crosses a line
19281928+ boundary. In effect, line terminations are not
19291929+ recognized within parentheses.
19301930+19311931+; Semicolon is used to start a comment; the remainder of
19321932+ the line is ignored.
19331933+19341934+5.2. Use of master files to define zones
19351935+19361936+When a master file is used to load a zone, the operation should be
19371937+suppressed if any errors are encountered in the master file. The
19381938+rationale for this is that a single error can have widespread
19391939+consequences. For example, suppose that the RRs defining a delegation
19401940+have syntax errors; then the server will return authoritative name
19411941+errors for all names in the subzone (except in the case where the
19421942+subzone is also present on the server).
19431943+19441944+Several other validity checks that should be performed in addition to
19451945+insuring that the file is syntactically correct:
19461946+19471947+ 1. All RRs in the file should have the same class.
19481948+19491949+ 2. Exactly one SOA RR should be present at the top of the zone.
19501950+19511951+ 3. If delegations are present and glue information is required,
19521952+ it should be present.
19531953+19541954+19551955+19561956+Mockapetris [Page 35]
19571957+19581958+RFC 1035 Domain Implementation and Specification November 1987
19591959+19601960+19611961+ 4. Information present outside of the authoritative nodes in the
19621962+ zone should be glue information, rather than the result of an
19631963+ origin or similar error.
19641964+19651965+5.3. Master file example
19661966+19671967+The following is an example file which might be used to define the
19681968+ISI.EDU zone.and is loaded with an origin of ISI.EDU:
19691969+19701970+@ IN SOA VENERA Action\.domains (
19711971+ 20 ; SERIAL
19721972+ 7200 ; REFRESH
19731973+ 600 ; RETRY
19741974+ 3600000; EXPIRE
19751975+ 60) ; MINIMUM
19761976+19771977+ NS A.ISI.EDU.
19781978+ NS VENERA
19791979+ NS VAXA
19801980+ MX 10 VENERA
19811981+ MX 20 VAXA
19821982+19831983+A A 26.3.0.103
19841984+19851985+VENERA A 10.1.0.52
19861986+ A 128.9.0.32
19871987+19881988+VAXA A 10.2.0.27
19891989+ A 128.9.0.33
19901990+19911991+19921992+$INCLUDE <SUBSYS>ISI-MAILBOXES.TXT
19931993+19941994+Where the file <SUBSYS>ISI-MAILBOXES.TXT is:
19951995+19961996+ MOE MB A.ISI.EDU.
19971997+ LARRY MB A.ISI.EDU.
19981998+ CURLEY MB A.ISI.EDU.
19991999+ STOOGES MG MOE
20002000+ MG LARRY
20012001+ MG CURLEY
20022002+20032003+Note the use of the \ character in the SOA RR to specify the responsible
20042004+person mailbox "Action.domains@E.ISI.EDU".
20052005+20062006+20072007+20082008+20092009+20102010+20112011+20122012+Mockapetris [Page 36]
20132013+20142014+RFC 1035 Domain Implementation and Specification November 1987
20152015+20162016+20172017+6. NAME SERVER IMPLEMENTATION
20182018+20192019+6.1. Architecture
20202020+20212021+The optimal structure for the name server will depend on the host
20222022+operating system and whether the name server is integrated with resolver
20232023+operations, either by supporting recursive service, or by sharing its
20242024+database with a resolver. This section discusses implementation
20252025+considerations for a name server which shares a database with a
20262026+resolver, but most of these concerns are present in any name server.
20272027+20282028+6.1.1. Control
20292029+20302030+A name server must employ multiple concurrent activities, whether they
20312031+are implemented as separate tasks in the host's OS or multiplexing
20322032+inside a single name server program. It is simply not acceptable for a
20332033+name server to block the service of UDP requests while it waits for TCP
20342034+data for refreshing or query activities. Similarly, a name server
20352035+should not attempt to provide recursive service without processing such
20362036+requests in parallel, though it may choose to serialize requests from a
20372037+single client, or to regard identical requests from the same client as
20382038+duplicates. A name server should not substantially delay requests while
20392039+it reloads a zone from master files or while it incorporates a newly
20402040+refreshed zone into its database.
20412041+20422042+6.1.2. Database
20432043+20442044+While name server implementations are free to use any internal data
20452045+structures they choose, the suggested structure consists of three major
20462046+parts:
20472047+20482048+ - A "catalog" data structure which lists the zones available to
20492049+ this server, and a "pointer" to the zone data structure. The
20502050+ main purpose of this structure is to find the nearest ancestor
20512051+ zone, if any, for arriving standard queries.
20522052+20532053+ - Separate data structures for each of the zones held by the
20542054+ name server.
20552055+20562056+ - A data structure for cached data. (or perhaps separate caches
20572057+ for different classes)
20582058+20592059+All of these data structures can be implemented an identical tree
20602060+structure format, with different data chained off the nodes in different
20612061+parts: in the catalog the data is pointers to zones, while in the zone
20622062+and cache data structures, the data will be RRs. In designing the tree
20632063+framework the designer should recognize that query processing will need
20642064+to traverse the tree using case-insensitive label comparisons; and that
20652065+20662066+20672067+20682068+Mockapetris [Page 37]
20692069+20702070+RFC 1035 Domain Implementation and Specification November 1987
20712071+20722072+20732073+in real data, a few nodes have a very high branching factor (100-1000 or
20742074+more), but the vast majority have a very low branching factor (0-1).
20752075+20762076+One way to solve the case problem is to store the labels for each node
20772077+in two pieces: a standardized-case representation of the label where all
20782078+ASCII characters are in a single case, together with a bit mask that
20792079+denotes which characters are actually of a different case. The
20802080+branching factor diversity can be handled using a simple linked list for
20812081+a node until the branching factor exceeds some threshold, and
20822082+transitioning to a hash structure after the threshold is exceeded. In
20832083+any case, hash structures used to store tree sections must insure that
20842084+hash functions and procedures preserve the casing conventions of the
20852085+DNS.
20862086+20872087+The use of separate structures for the different parts of the database
20882088+is motivated by several factors:
20892089+20902090+ - The catalog structure can be an almost static structure that
20912091+ need change only when the system administrator changes the
20922092+ zones supported by the server. This structure can also be
20932093+ used to store parameters used to control refreshing
20942094+ activities.
20952095+20962096+ - The individual data structures for zones allow a zone to be
20972097+ replaced simply by changing a pointer in the catalog. Zone
20982098+ refresh operations can build a new structure and, when
20992099+ complete, splice it into the database via a simple pointer
21002100+ replacement. It is very important that when a zone is
21012101+ refreshed, queries should not use old and new data
21022102+ simultaneously.
21032103+21042104+ - With the proper search procedures, authoritative data in zones
21052105+ will always "hide", and hence take precedence over, cached
21062106+ data.
21072107+21082108+ - Errors in zone definitions that cause overlapping zones, etc.,
21092109+ may cause erroneous responses to queries, but problem
21102110+ determination is simplified, and the contents of one "bad"
21112111+ zone can't corrupt another.
21122112+21132113+ - Since the cache is most frequently updated, it is most
21142114+ vulnerable to corruption during system restarts. It can also
21152115+ become full of expired RR data. In either case, it can easily
21162116+ be discarded without disturbing zone data.
21172117+21182118+A major aspect of database design is selecting a structure which allows
21192119+the name server to deal with crashes of the name server's host. State
21202120+information which a name server should save across system crashes
21212121+21222122+21232123+21242124+Mockapetris [Page 38]
21252125+21262126+RFC 1035 Domain Implementation and Specification November 1987
21272127+21282128+21292129+includes the catalog structure (including the state of refreshing for
21302130+each zone) and the zone data itself.
21312131+21322132+6.1.3. Time
21332133+21342134+Both the TTL data for RRs and the timing data for refreshing activities
21352135+depends on 32 bit timers in units of seconds. Inside the database,
21362136+refresh timers and TTLs for cached data conceptually "count down", while
21372137+data in the zone stays with constant TTLs.
21382138+21392139+A recommended implementation strategy is to store time in two ways: as
21402140+a relative increment and as an absolute time. One way to do this is to
21412141+use positive 32 bit numbers for one type and negative numbers for the
21422142+other. The RRs in zones use relative times; the refresh timers and
21432143+cache data use absolute times. Absolute numbers are taken with respect
21442144+to some known origin and converted to relative values when placed in the
21452145+response to a query. When an absolute TTL is negative after conversion
21462146+to relative, then the data is expired and should be ignored.
21472147+21482148+6.2. Standard query processing
21492149+21502150+The major algorithm for standard query processing is presented in
21512151+[RFC-1034].
21522152+21532153+When processing queries with QCLASS=*, or some other QCLASS which
21542154+matches multiple classes, the response should never be authoritative
21552155+unless the server can guarantee that the response covers all classes.
21562156+21572157+When composing a response, RRs which are to be inserted in the
21582158+additional section, but duplicate RRs in the answer or authority
21592159+sections, may be omitted from the additional section.
21602160+21612161+When a response is so long that truncation is required, the truncation
21622162+should start at the end of the response and work forward in the
21632163+datagram. Thus if there is any data for the authority section, the
21642164+answer section is guaranteed to be unique.
21652165+21662166+The MINIMUM value in the SOA should be used to set a floor on the TTL of
21672167+data distributed from a zone. This floor function should be done when
21682168+the data is copied into a response. This will allow future dynamic
21692169+update protocols to change the SOA MINIMUM field without ambiguous
21702170+semantics.
21712171+21722172+6.3. Zone refresh and reload processing
21732173+21742174+In spite of a server's best efforts, it may be unable to load zone data
21752175+from a master file due to syntax errors, etc., or be unable to refresh a
21762176+zone within the its expiration parameter. In this case, the name server
21772177+21782178+21792179+21802180+Mockapetris [Page 39]
21812181+21822182+RFC 1035 Domain Implementation and Specification November 1987
21832183+21842184+21852185+should answer queries as if it were not supposed to possess the zone.
21862186+21872187+If a master is sending a zone out via AXFR, and a new version is created
21882188+during the transfer, the master should continue to send the old version
21892189+if possible. In any case, it should never send part of one version and
21902190+part of another. If completion is not possible, the master should reset
21912191+the connection on which the zone transfer is taking place.
21922192+21932193+6.4. Inverse queries (Optional)
21942194+21952195+Inverse queries are an optional part of the DNS. Name servers are not
21962196+required to support any form of inverse queries. If a name server
21972197+receives an inverse query that it does not support, it returns an error
21982198+response with the "Not Implemented" error set in the header. While
21992199+inverse query support is optional, all name servers must be at least
22002200+able to return the error response.
22012201+22022202+6.4.1. The contents of inverse queries and responses Inverse
22032203+queries reverse the mappings performed by standard query operations;
22042204+while a standard query maps a domain name to a resource, an inverse
22052205+query maps a resource to a domain name. For example, a standard query
22062206+might bind a domain name to a host address; the corresponding inverse
22072207+query binds the host address to a domain name.
22082208+22092209+Inverse queries take the form of a single RR in the answer section of
22102210+the message, with an empty question section. The owner name of the
22112211+query RR and its TTL are not significant. The response carries
22122212+questions in the question section which identify all names possessing
22132213+the query RR WHICH THE NAME SERVER KNOWS. Since no name server knows
22142214+about all of the domain name space, the response can never be assumed to
22152215+be complete. Thus inverse queries are primarily useful for database
22162216+management and debugging activities. Inverse queries are NOT an
22172217+acceptable method of mapping host addresses to host names; use the IN-
22182218+ADDR.ARPA domain instead.
22192219+22202220+Where possible, name servers should provide case-insensitive comparisons
22212221+for inverse queries. Thus an inverse query asking for an MX RR of
22222222+"Venera.isi.edu" should get the same response as a query for
22232223+"VENERA.ISI.EDU"; an inverse query for HINFO RR "IBM-PC UNIX" should
22242224+produce the same result as an inverse query for "IBM-pc unix". However,
22252225+this cannot be guaranteed because name servers may possess RRs that
22262226+contain character strings but the name server does not know that the
22272227+data is character.
22282228+22292229+When a name server processes an inverse query, it either returns:
22302230+22312231+ 1. zero, one, or multiple domain names for the specified
22322232+ resource as QNAMEs in the question section
22332233+22342234+22352235+22362236+Mockapetris [Page 40]
22372237+22382238+RFC 1035 Domain Implementation and Specification November 1987
22392239+22402240+22412241+ 2. an error code indicating that the name server doesn't support
22422242+ inverse mapping of the specified resource type.
22432243+22442244+When the response to an inverse query contains one or more QNAMEs, the
22452245+owner name and TTL of the RR in the answer section which defines the
22462246+inverse query is modified to exactly match an RR found at the first
22472247+QNAME.
22482248+22492249+RRs returned in the inverse queries cannot be cached using the same
22502250+mechanism as is used for the replies to standard queries. One reason
22512251+for this is that a name might have multiple RRs of the same type, and
22522252+only one would appear. For example, an inverse query for a single
22532253+address of a multiply homed host might create the impression that only
22542254+one address existed.
22552255+22562256+6.4.2. Inverse query and response example The overall structure
22572257+of an inverse query for retrieving the domain name that corresponds to
22582258+Internet address 10.1.0.52 is shown below:
22592259+22602260+ +-----------------------------------------+
22612261+ Header | OPCODE=IQUERY, ID=997 |
22622262+ +-----------------------------------------+
22632263+ Question | <empty> |
22642264+ +-----------------------------------------+
22652265+ Answer | <anyname> A IN 10.1.0.52 |
22662266+ +-----------------------------------------+
22672267+ Authority | <empty> |
22682268+ +-----------------------------------------+
22692269+ Additional | <empty> |
22702270+ +-----------------------------------------+
22712271+22722272+This query asks for a question whose answer is the Internet style
22732273+address 10.1.0.52. Since the owner name is not known, any domain name
22742274+can be used as a placeholder (and is ignored). A single octet of zero,
22752275+signifying the root, is usually used because it minimizes the length of
22762276+the message. The TTL of the RR is not significant. The response to
22772277+this query might be:
22782278+22792279+22802280+22812281+22822282+22832283+22842284+22852285+22862286+22872287+22882288+22892289+22902290+22912291+22922292+Mockapetris [Page 41]
22932293+22942294+RFC 1035 Domain Implementation and Specification November 1987
22952295+22962296+22972297+ +-----------------------------------------+
22982298+ Header | OPCODE=RESPONSE, ID=997 |
22992299+ +-----------------------------------------+
23002300+ Question |QTYPE=A, QCLASS=IN, QNAME=VENERA.ISI.EDU |
23012301+ +-----------------------------------------+
23022302+ Answer | VENERA.ISI.EDU A IN 10.1.0.52 |
23032303+ +-----------------------------------------+
23042304+ Authority | <empty> |
23052305+ +-----------------------------------------+
23062306+ Additional | <empty> |
23072307+ +-----------------------------------------+
23082308+23092309+Note that the QTYPE in a response to an inverse query is the same as the
23102310+TYPE field in the answer section of the inverse query. Responses to
23112311+inverse queries may contain multiple questions when the inverse is not
23122312+unique. If the question section in the response is not empty, then the
23132313+RR in the answer section is modified to correspond to be an exact copy
23142314+of an RR at the first QNAME.
23152315+23162316+6.4.3. Inverse query processing
23172317+23182318+Name servers that support inverse queries can support these operations
23192319+through exhaustive searches of their databases, but this becomes
23202320+impractical as the size of the database increases. An alternative
23212321+approach is to invert the database according to the search key.
23222322+23232323+For name servers that support multiple zones and a large amount of data,
23242324+the recommended approach is separate inversions for each zone. When a
23252325+particular zone is changed during a refresh, only its inversions need to
23262326+be redone.
23272327+23282328+Support for transfer of this type of inversion may be included in future
23292329+versions of the domain system, but is not supported in this version.
23302330+23312331+6.5. Completion queries and responses
23322332+23332333+The optional completion services described in RFC-882 and RFC-883 have
23342334+been deleted. Redesigned services may become available in the future.
23352335+23362336+23372337+23382338+23392339+23402340+23412341+23422342+23432343+23442344+23452345+23462346+23472347+23482348+Mockapetris [Page 42]
23492349+23502350+RFC 1035 Domain Implementation and Specification November 1987
23512351+23522352+23532353+7. RESOLVER IMPLEMENTATION
23542354+23552355+The top levels of the recommended resolver algorithm are discussed in
23562356+[RFC-1034]. This section discusses implementation details assuming the
23572357+database structure suggested in the name server implementation section
23582358+of this memo.
23592359+23602360+7.1. Transforming a user request into a query
23612361+23622362+The first step a resolver takes is to transform the client's request,
23632363+stated in a format suitable to the local OS, into a search specification
23642364+for RRs at a specific name which match a specific QTYPE and QCLASS.
23652365+Where possible, the QTYPE and QCLASS should correspond to a single type
23662366+and a single class, because this makes the use of cached data much
23672367+simpler. The reason for this is that the presence of data of one type
23682368+in a cache doesn't confirm the existence or non-existence of data of
23692369+other types, hence the only way to be sure is to consult an
23702370+authoritative source. If QCLASS=* is used, then authoritative answers
23712371+won't be available.
23722372+23732373+Since a resolver must be able to multiplex multiple requests if it is to
23742374+perform its function efficiently, each pending request is usually
23752375+represented in some block of state information. This state block will
23762376+typically contain:
23772377+23782378+ - A timestamp indicating the time the request began.
23792379+ The timestamp is used to decide whether RRs in the database
23802380+ can be used or are out of date. This timestamp uses the
23812381+ absolute time format previously discussed for RR storage in
23822382+ zones and caches. Note that when an RRs TTL indicates a
23832383+ relative time, the RR must be timely, since it is part of a
23842384+ zone. When the RR has an absolute time, it is part of a
23852385+ cache, and the TTL of the RR is compared against the timestamp
23862386+ for the start of the request.
23872387+23882388+ Note that using the timestamp is superior to using a current
23892389+ time, since it allows RRs with TTLs of zero to be entered in
23902390+ the cache in the usual manner, but still used by the current
23912391+ request, even after intervals of many seconds due to system
23922392+ load, query retransmission timeouts, etc.
23932393+23942394+ - Some sort of parameters to limit the amount of work which will
23952395+ be performed for this request.
23962396+23972397+ The amount of work which a resolver will do in response to a
23982398+ client request must be limited to guard against errors in the
23992399+ database, such as circular CNAME references, and operational
24002400+ problems, such as network partition which prevents the
24012401+24022402+24032403+24042404+Mockapetris [Page 43]
24052405+24062406+RFC 1035 Domain Implementation and Specification November 1987
24072407+24082408+24092409+ resolver from accessing the name servers it needs. While
24102410+ local limits on the number of times a resolver will retransmit
24112411+ a particular query to a particular name server address are
24122412+ essential, the resolver should have a global per-request
24132413+ counter to limit work on a single request. The counter should
24142414+ be set to some initial value and decremented whenever the
24152415+ resolver performs any action (retransmission timeout,
24162416+ retransmission, etc.) If the counter passes zero, the request
24172417+ is terminated with a temporary error.
24182418+24192419+ Note that if the resolver structure allows one request to
24202420+ start others in parallel, such as when the need to access a
24212421+ name server for one request causes a parallel resolve for the
24222422+ name server's addresses, the spawned request should be started
24232423+ with a lower counter. This prevents circular references in
24242424+ the database from starting a chain reaction of resolver
24252425+ activity.
24262426+24272427+ - The SLIST data structure discussed in [RFC-1034].
24282428+24292429+ This structure keeps track of the state of a request if it
24302430+ must wait for answers from foreign name servers.
24312431+24322432+7.2. Sending the queries
24332433+24342434+As described in [RFC-1034], the basic task of the resolver is to
24352435+formulate a query which will answer the client's request and direct that
24362436+query to name servers which can provide the information. The resolver
24372437+will usually only have very strong hints about which servers to ask, in
24382438+the form of NS RRs, and may have to revise the query, in response to
24392439+CNAMEs, or revise the set of name servers the resolver is asking, in
24402440+response to delegation responses which point the resolver to name
24412441+servers closer to the desired information. In addition to the
24422442+information requested by the client, the resolver may have to call upon
24432443+its own services to determine the address of name servers it wishes to
24442444+contact.
24452445+24462446+In any case, the model used in this memo assumes that the resolver is
24472447+multiplexing attention between multiple requests, some from the client,
24482448+and some internally generated. Each request is represented by some
24492449+state information, and the desired behavior is that the resolver
24502450+transmit queries to name servers in a way that maximizes the probability
24512451+that the request is answered, minimizes the time that the request takes,
24522452+and avoids excessive transmissions. The key algorithm uses the state
24532453+information of the request to select the next name server address to
24542454+query, and also computes a timeout which will cause the next action
24552455+should a response not arrive. The next action will usually be a
24562456+transmission to some other server, but may be a temporary error to the
24572457+24582458+24592459+24602460+Mockapetris [Page 44]
24612461+24622462+RFC 1035 Domain Implementation and Specification November 1987
24632463+24642464+24652465+client.
24662466+24672467+The resolver always starts with a list of server names to query (SLIST).
24682468+This list will be all NS RRs which correspond to the nearest ancestor
24692469+zone that the resolver knows about. To avoid startup problems, the
24702470+resolver should have a set of default servers which it will ask should
24712471+it have no current NS RRs which are appropriate. The resolver then adds
24722472+to SLIST all of the known addresses for the name servers, and may start
24732473+parallel requests to acquire the addresses of the servers when the
24742474+resolver has the name, but no addresses, for the name servers.
24752475+24762476+To complete initialization of SLIST, the resolver attaches whatever
24772477+history information it has to the each address in SLIST. This will
24782478+usually consist of some sort of weighted averages for the response time
24792479+of the address, and the batting average of the address (i.e., how often
24802480+the address responded at all to the request). Note that this
24812481+information should be kept on a per address basis, rather than on a per
24822482+name server basis, because the response time and batting average of a
24832483+particular server may vary considerably from address to address. Note
24842484+also that this information is actually specific to a resolver address /
24852485+server address pair, so a resolver with multiple addresses may wish to
24862486+keep separate histories for each of its addresses. Part of this step
24872487+must deal with addresses which have no such history; in this case an
24882488+expected round trip time of 5-10 seconds should be the worst case, with
24892489+lower estimates for the same local network, etc.
24902490+24912491+Note that whenever a delegation is followed, the resolver algorithm
24922492+reinitializes SLIST.
24932493+24942494+The information establishes a partial ranking of the available name
24952495+server addresses. Each time an address is chosen and the state should
24962496+be altered to prevent its selection again until all other addresses have
24972497+been tried. The timeout for each transmission should be 50-100% greater
24982498+than the average predicted value to allow for variance in response.
24992499+25002500+Some fine points:
25012501+25022502+ - The resolver may encounter a situation where no addresses are
25032503+ available for any of the name servers named in SLIST, and
25042504+ where the servers in the list are precisely those which would
25052505+ normally be used to look up their own addresses. This
25062506+ situation typically occurs when the glue address RRs have a
25072507+ smaller TTL than the NS RRs marking delegation, or when the
25082508+ resolver caches the result of a NS search. The resolver
25092509+ should detect this condition and restart the search at the
25102510+ next ancestor zone, or alternatively at the root.
25112511+25122512+25132513+25142514+25152515+25162516+Mockapetris [Page 45]
25172517+25182518+RFC 1035 Domain Implementation and Specification November 1987
25192519+25202520+25212521+ - If a resolver gets a server error or other bizarre response
25222522+ from a name server, it should remove it from SLIST, and may
25232523+ wish to schedule an immediate transmission to the next
25242524+ candidate server address.
25252525+25262526+7.3. Processing responses
25272527+25282528+The first step in processing arriving response datagrams is to parse the
25292529+response. This procedure should include:
25302530+25312531+ - Check the header for reasonableness. Discard datagrams which
25322532+ are queries when responses are expected.
25332533+25342534+ - Parse the sections of the message, and insure that all RRs are
25352535+ correctly formatted.
25362536+25372537+ - As an optional step, check the TTLs of arriving data looking
25382538+ for RRs with excessively long TTLs. If a RR has an
25392539+ excessively long TTL, say greater than 1 week, either discard
25402540+ the whole response, or limit all TTLs in the response to 1
25412541+ week.
25422542+25432543+The next step is to match the response to a current resolver request.
25442544+The recommended strategy is to do a preliminary matching using the ID
25452545+field in the domain header, and then to verify that the question section
25462546+corresponds to the information currently desired. This requires that
25472547+the transmission algorithm devote several bits of the domain ID field to
25482548+a request identifier of some sort. This step has several fine points:
25492549+25502550+ - Some name servers send their responses from different
25512551+ addresses than the one used to receive the query. That is, a
25522552+ resolver cannot rely that a response will come from the same
25532553+ address which it sent the corresponding query to. This name
25542554+ server bug is typically encountered in UNIX systems.
25552555+25562556+ - If the resolver retransmits a particular request to a name
25572557+ server it should be able to use a response from any of the
25582558+ transmissions. However, if it is using the response to sample
25592559+ the round trip time to access the name server, it must be able
25602560+ to determine which transmission matches the response (and keep
25612561+ transmission times for each outgoing message), or only
25622562+ calculate round trip times based on initial transmissions.
25632563+25642564+ - A name server will occasionally not have a current copy of a
25652565+ zone which it should have according to some NS RRs. The
25662566+ resolver should simply remove the name server from the current
25672567+ SLIST, and continue.
25682568+25692569+25702570+25712571+25722572+Mockapetris [Page 46]
25732573+25742574+RFC 1035 Domain Implementation and Specification November 1987
25752575+25762576+25772577+7.4. Using the cache
25782578+25792579+In general, we expect a resolver to cache all data which it receives in
25802580+responses since it may be useful in answering future client requests.
25812581+However, there are several types of data which should not be cached:
25822582+25832583+ - When several RRs of the same type are available for a
25842584+ particular owner name, the resolver should either cache them
25852585+ all or none at all. When a response is truncated, and a
25862586+ resolver doesn't know whether it has a complete set, it should
25872587+ not cache a possibly partial set of RRs.
25882588+25892589+ - Cached data should never be used in preference to
25902590+ authoritative data, so if caching would cause this to happen
25912591+ the data should not be cached.
25922592+25932593+ - The results of an inverse query should not be cached.
25942594+25952595+ - The results of standard queries where the QNAME contains "*"
25962596+ labels if the data might be used to construct wildcards. The
25972597+ reason is that the cache does not necessarily contain existing
25982598+ RRs or zone boundary information which is necessary to
25992599+ restrict the application of the wildcard RRs.
26002600+26012601+ - RR data in responses of dubious reliability. When a resolver
26022602+ receives unsolicited responses or RR data other than that
26032603+ requested, it should discard it without caching it. The basic
26042604+ implication is that all sanity checks on a packet should be
26052605+ performed before any of it is cached.
26062606+26072607+In a similar vein, when a resolver has a set of RRs for some name in a
26082608+response, and wants to cache the RRs, it should check its cache for
26092609+already existing RRs. Depending on the circumstances, either the data
26102610+in the response or the cache is preferred, but the two should never be
26112611+combined. If the data in the response is from authoritative data in the
26122612+answer section, it is always preferred.
26132613+26142614+8. MAIL SUPPORT
26152615+26162616+The domain system defines a standard for mapping mailboxes into domain
26172617+names, and two methods for using the mailbox information to derive mail
26182618+routing information. The first method is called mail exchange binding
26192619+and the other method is mailbox binding. The mailbox encoding standard
26202620+and mail exchange binding are part of the DNS official protocol, and are
26212621+the recommended method for mail routing in the Internet. Mailbox
26222622+binding is an experimental feature which is still under development and
26232623+subject to change.
26242624+26252625+26262626+26272627+26282628+Mockapetris [Page 47]
26292629+26302630+RFC 1035 Domain Implementation and Specification November 1987
26312631+26322632+26332633+The mailbox encoding standard assumes a mailbox name of the form
26342634+"<local-part>@<mail-domain>". While the syntax allowed in each of these
26352635+sections varies substantially between the various mail internets, the
26362636+preferred syntax for the ARPA Internet is given in [RFC-822].
26372637+26382638+The DNS encodes the <local-part> as a single label, and encodes the
26392639+<mail-domain> as a domain name. The single label from the <local-part>
26402640+is prefaced to the domain name from <mail-domain> to form the domain
26412641+name corresponding to the mailbox. Thus the mailbox HOSTMASTER@SRI-
26422642+NIC.ARPA is mapped into the domain name HOSTMASTER.SRI-NIC.ARPA. If the
26432643+<local-part> contains dots or other special characters, its
26442644+representation in a master file will require the use of backslash
26452645+quoting to ensure that the domain name is properly encoded. For
26462646+example, the mailbox Action.domains@ISI.EDU would be represented as
26472647+Action\.domains.ISI.EDU.
26482648+26492649+8.1. Mail exchange binding
26502650+26512651+Mail exchange binding uses the <mail-domain> part of a mailbox
26522652+specification to determine where mail should be sent. The <local-part>
26532653+is not even consulted. [RFC-974] specifies this method in detail, and
26542654+should be consulted before attempting to use mail exchange support.
26552655+26562656+One of the advantages of this method is that it decouples mail
26572657+destination naming from the hosts used to support mail service, at the
26582658+cost of another layer of indirection in the lookup function. However,
26592659+the addition layer should eliminate the need for complicated "%", "!",
26602660+etc encodings in <local-part>.
26612661+26622662+The essence of the method is that the <mail-domain> is used as a domain
26632663+name to locate type MX RRs which list hosts willing to accept mail for
26642664+<mail-domain>, together with preference values which rank the hosts
26652665+according to an order specified by the administrators for <mail-domain>.
26662666+26672667+In this memo, the <mail-domain> ISI.EDU is used in examples, together
26682668+with the hosts VENERA.ISI.EDU and VAXA.ISI.EDU as mail exchanges for
26692669+ISI.EDU. If a mailer had a message for Mockapetris@ISI.EDU, it would
26702670+route it by looking up MX RRs for ISI.EDU. The MX RRs at ISI.EDU name
26712671+VENERA.ISI.EDU and VAXA.ISI.EDU, and type A queries can find the host
26722672+addresses.
26732673+26742674+8.2. Mailbox binding (Experimental)
26752675+26762676+In mailbox binding, the mailer uses the entire mail destination
26772677+specification to construct a domain name. The encoded domain name for
26782678+the mailbox is used as the QNAME field in a QTYPE=MAILB query.
26792679+26802680+Several outcomes are possible for this query:
26812681+26822682+26832683+26842684+Mockapetris [Page 48]
26852685+26862686+RFC 1035 Domain Implementation and Specification November 1987
26872687+26882688+26892689+ 1. The query can return a name error indicating that the mailbox
26902690+ does not exist as a domain name.
26912691+26922692+ In the long term, this would indicate that the specified
26932693+ mailbox doesn't exist. However, until the use of mailbox
26942694+ binding is universal, this error condition should be
26952695+ interpreted to mean that the organization identified by the
26962696+ global part does not support mailbox binding. The
26972697+ appropriate procedure is to revert to exchange binding at
26982698+ this point.
26992699+27002700+ 2. The query can return a Mail Rename (MR) RR.
27012701+27022702+ The MR RR carries new mailbox specification in its RDATA
27032703+ field. The mailer should replace the old mailbox with the
27042704+ new one and retry the operation.
27052705+27062706+ 3. The query can return a MB RR.
27072707+27082708+ The MB RR carries a domain name for a host in its RDATA
27092709+ field. The mailer should deliver the message to that host
27102710+ via whatever protocol is applicable, e.g., b,SMTP.
27112711+27122712+ 4. The query can return one or more Mail Group (MG) RRs.
27132713+27142714+ This condition means that the mailbox was actually a mailing
27152715+ list or mail group, rather than a single mailbox. Each MG RR
27162716+ has a RDATA field that identifies a mailbox that is a member
27172717+ of the group. The mailer should deliver a copy of the
27182718+ message to each member.
27192719+27202720+ 5. The query can return a MB RR as well as one or more MG RRs.
27212721+27222722+ This condition means the the mailbox was actually a mailing
27232723+ list. The mailer can either deliver the message to the host
27242724+ specified by the MB RR, which will in turn do the delivery to
27252725+ all members, or the mailer can use the MG RRs to do the
27262726+ expansion itself.
27272727+27282728+In any of these cases, the response may include a Mail Information
27292729+(MINFO) RR. This RR is usually associated with a mail group, but is
27302730+legal with a MB. The MINFO RR identifies two mailboxes. One of these
27312731+identifies a responsible person for the original mailbox name. This
27322732+mailbox should be used for requests to be added to a mail group, etc.
27332733+The second mailbox name in the MINFO RR identifies a mailbox that should
27342734+receive error messages for mail failures. This is particularly
27352735+appropriate for mailing lists when errors in member names should be
27362736+reported to a person other than the one who sends a message to the list.
27372737+27382738+27392739+27402740+Mockapetris [Page 49]
27412741+27422742+RFC 1035 Domain Implementation and Specification November 1987
27432743+27442744+27452745+New fields may be added to this RR in the future.
27462746+27472747+27482748+9. REFERENCES and BIBLIOGRAPHY
27492749+27502750+[Dyer 87] S. Dyer, F. Hsu, "Hesiod", Project Athena
27512751+ Technical Plan - Name Service, April 1987, version 1.9.
27522752+27532753+ Describes the fundamentals of the Hesiod name service.
27542754+27552755+[IEN-116] J. Postel, "Internet Name Server", IEN-116,
27562756+ USC/Information Sciences Institute, August 1979.
27572757+27582758+ A name service obsoleted by the Domain Name System, but
27592759+ still in use.
27602760+27612761+[Quarterman 86] J. Quarterman, and J. Hoskins, "Notable Computer Networks",
27622762+ Communications of the ACM, October 1986, volume 29, number
27632763+ 10.
27642764+27652765+[RFC-742] K. Harrenstien, "NAME/FINGER", RFC-742, Network
27662766+ Information Center, SRI International, December 1977.
27672767+27682768+[RFC-768] J. Postel, "User Datagram Protocol", RFC-768,
27692769+ USC/Information Sciences Institute, August 1980.
27702770+27712771+[RFC-793] J. Postel, "Transmission Control Protocol", RFC-793,
27722772+ USC/Information Sciences Institute, September 1981.
27732773+27742774+[RFC-799] D. Mills, "Internet Name Domains", RFC-799, COMSAT,
27752775+ September 1981.
27762776+27772777+ Suggests introduction of a hierarchy in place of a flat
27782778+ name space for the Internet.
27792779+27802780+[RFC-805] J. Postel, "Computer Mail Meeting Notes", RFC-805,
27812781+ USC/Information Sciences Institute, February 1982.
27822782+27832783+[RFC-810] E. Feinler, K. Harrenstien, Z. Su, and V. White, "DOD
27842784+ Internet Host Table Specification", RFC-810, Network
27852785+ Information Center, SRI International, March 1982.
27862786+27872787+ Obsolete. See RFC-952.
27882788+27892789+[RFC-811] K. Harrenstien, V. White, and E. Feinler, "Hostnames
27902790+ Server", RFC-811, Network Information Center, SRI
27912791+ International, March 1982.
27922792+27932793+27942794+27952795+27962796+Mockapetris [Page 50]
27972797+27982798+RFC 1035 Domain Implementation and Specification November 1987
27992799+28002800+28012801+ Obsolete. See RFC-953.
28022802+28032803+[RFC-812] K. Harrenstien, and V. White, "NICNAME/WHOIS", RFC-812,
28042804+ Network Information Center, SRI International, March
28052805+ 1982.
28062806+28072807+[RFC-819] Z. Su, and J. Postel, "The Domain Naming Convention for
28082808+ Internet User Applications", RFC-819, Network
28092809+ Information Center, SRI International, August 1982.
28102810+28112811+ Early thoughts on the design of the domain system.
28122812+ Current implementation is completely different.
28132813+28142814+[RFC-821] J. Postel, "Simple Mail Transfer Protocol", RFC-821,
28152815+ USC/Information Sciences Institute, August 1980.
28162816+28172817+[RFC-830] Z. Su, "A Distributed System for Internet Name Service",
28182818+ RFC-830, Network Information Center, SRI International,
28192819+ October 1982.
28202820+28212821+ Early thoughts on the design of the domain system.
28222822+ Current implementation is completely different.
28232823+28242824+[RFC-882] P. Mockapetris, "Domain names - Concepts and
28252825+ Facilities," RFC-882, USC/Information Sciences
28262826+ Institute, November 1983.
28272827+28282828+ Superceeded by this memo.
28292829+28302830+[RFC-883] P. Mockapetris, "Domain names - Implementation and
28312831+ Specification," RFC-883, USC/Information Sciences
28322832+ Institute, November 1983.
28332833+28342834+ Superceeded by this memo.
28352835+28362836+[RFC-920] J. Postel and J. Reynolds, "Domain Requirements",
28372837+ RFC-920, USC/Information Sciences Institute,
28382838+ October 1984.
28392839+28402840+ Explains the naming scheme for top level domains.
28412841+28422842+[RFC-952] K. Harrenstien, M. Stahl, E. Feinler, "DoD Internet Host
28432843+ Table Specification", RFC-952, SRI, October 1985.
28442844+28452845+ Specifies the format of HOSTS.TXT, the host/address
28462846+ table replaced by the DNS.
28472847+28482848+28492849+28502850+28512851+28522852+Mockapetris [Page 51]
28532853+28542854+RFC 1035 Domain Implementation and Specification November 1987
28552855+28562856+28572857+[RFC-953] K. Harrenstien, M. Stahl, E. Feinler, "HOSTNAME Server",
28582858+ RFC-953, SRI, October 1985.
28592859+28602860+ This RFC contains the official specification of the
28612861+ hostname server protocol, which is obsoleted by the DNS.
28622862+ This TCP based protocol accesses information stored in
28632863+ the RFC-952 format, and is used to obtain copies of the
28642864+ host table.
28652865+28662866+[RFC-973] P. Mockapetris, "Domain System Changes and
28672867+ Observations", RFC-973, USC/Information Sciences
28682868+ Institute, January 1986.
28692869+28702870+ Describes changes to RFC-882 and RFC-883 and reasons for
28712871+ them.
28722872+28732873+[RFC-974] C. Partridge, "Mail routing and the domain system",
28742874+ RFC-974, CSNET CIC BBN Labs, January 1986.
28752875+28762876+ Describes the transition from HOSTS.TXT based mail
28772877+ addressing to the more powerful MX system used with the
28782878+ domain system.
28792879+28802880+[RFC-1001] NetBIOS Working Group, "Protocol standard for a NetBIOS
28812881+ service on a TCP/UDP transport: Concepts and Methods",
28822882+ RFC-1001, March 1987.
28832883+28842884+ This RFC and RFC-1002 are a preliminary design for
28852885+ NETBIOS on top of TCP/IP which proposes to base NetBIOS
28862886+ name service on top of the DNS.
28872887+28882888+[RFC-1002] NetBIOS Working Group, "Protocol standard for a NetBIOS
28892889+ service on a TCP/UDP transport: Detailed
28902890+ Specifications", RFC-1002, March 1987.
28912891+28922892+[RFC-1010] J. Reynolds, and J. Postel, "Assigned Numbers", RFC-1010,
28932893+ USC/Information Sciences Institute, May 1987.
28942894+28952895+ Contains socket numbers and mnemonics for host names,
28962896+ operating systems, etc.
28972897+28982898+[RFC-1031] W. Lazear, "MILNET Name Domain Transition", RFC-1031,
28992899+ November 1987.
29002900+29012901+ Describes a plan for converting the MILNET to the DNS.
29022902+29032903+[RFC-1032] M. Stahl, "Establishing a Domain - Guidelines for
29042904+ Administrators", RFC-1032, November 1987.
29052905+29062906+29072907+29082908+Mockapetris [Page 52]
29092909+29102910+RFC 1035 Domain Implementation and Specification November 1987
29112911+29122912+29132913+ Describes the registration policies used by the NIC to
29142914+ administer the top level domains and delegate subzones.
29152915+29162916+[RFC-1033] M. Lottor, "Domain Administrators Operations Guide",
29172917+ RFC-1033, November 1987.
29182918+29192919+ A cookbook for domain administrators.
29202920+29212921+[Solomon 82] M. Solomon, L. Landweber, and D. Neuhengen, "The CSNET
29222922+ Name Server", Computer Networks, vol 6, nr 3, July 1982.
29232923+29242924+ Describes a name service for CSNET which is independent
29252925+ from the DNS and DNS use in the CSNET.
29262926+29272927+29282928+29292929+29302930+29312931+29322932+29332933+29342934+29352935+29362936+29372937+29382938+29392939+29402940+29412941+29422942+29432943+29442944+29452945+29462946+29472947+29482948+29492949+29502950+29512951+29522952+29532953+29542954+29552955+29562956+29572957+29582958+29592959+29602960+29612961+29622962+29632963+29642964+Mockapetris [Page 53]
29652965+29662966+RFC 1035 Domain Implementation and Specification November 1987
29672967+29682968+29692969+Index
29702970+29712971+ * 13
29722972+29732973+ ; 33, 35
29742974+29752975+ <character-string> 35
29762976+ <domain-name> 34
29772977+29782978+ @ 35
29792979+29802980+ \ 35
29812981+29822982+ A 12
29832983+29842984+ Byte order 8
29852985+29862986+ CH 13
29872987+ Character case 9
29882988+ CLASS 11
29892989+ CNAME 12
29902990+ Completion 42
29912991+ CS 13
29922992+29932993+ Hesiod 13
29942994+ HINFO 12
29952995+ HS 13
29962996+29972997+ IN 13
29982998+ IN-ADDR.ARPA domain 22
29992999+ Inverse queries 40
30003000+30013001+ Mailbox names 47
30023002+ MB 12
30033003+ MD 12
30043004+ MF 12
30053005+ MG 12
30063006+ MINFO 12
30073007+ MINIMUM 20
30083008+ MR 12
30093009+ MX 12
30103010+30113011+ NS 12
30123012+ NULL 12
30133013+30143014+ Port numbers 32
30153015+ Primary server 5
30163016+ PTR 12, 18
30173017+30183018+30193019+30203020+Mockapetris [Page 54]
30213021+30223022+RFC 1035 Domain Implementation and Specification November 1987
30233023+30243024+30253025+ QCLASS 13
30263026+ QTYPE 12
30273027+30283028+ RDATA 12
30293029+ RDLENGTH 11
30303030+30313031+ Secondary server 5
30323032+ SOA 12
30333033+ Stub resolvers 7
30343034+30353035+ TCP 32
30363036+ TXT 12
30373037+ TYPE 11
30383038+30393039+ UDP 32
30403040+30413041+ WKS 12
30423042+30433043+30443044+30453045+30463046+30473047+30483048+30493049+30503050+30513051+30523052+30533053+30543054+30553055+30563056+30573057+30583058+30593059+30603060+30613061+30623062+30633063+30643064+30653065+30663066+30673067+30683068+30693069+30703070+30713071+30723072+30733073+30743074+30753075+30763076+Mockapetris [Page 55]
30773077+
+1963
spec/rfc3492.txt
···11+22+33+44+55+66+77+Network Working Group A. Costello
88+Request for Comments: 3492 Univ. of California, Berkeley
99+Category: Standards Track March 2003
1010+1111+1212+ Punycode: A Bootstring encoding of Unicode
1313+ for Internationalized Domain Names in Applications (IDNA)
1414+1515+Status of this Memo
1616+1717+ This document specifies an Internet standards track protocol for the
1818+ Internet community, and requests discussion and suggestions for
1919+ improvements. Please refer to the current edition of the "Internet
2020+ Official Protocol Standards" (STD 1) for the standardization state
2121+ and status of this protocol. Distribution of this memo is unlimited.
2222+2323+Copyright Notice
2424+2525+ Copyright (C) The Internet Society (2003). All Rights Reserved.
2626+2727+Abstract
2828+2929+ Punycode is a simple and efficient transfer encoding syntax designed
3030+ for use with Internationalized Domain Names in Applications (IDNA).
3131+ It uniquely and reversibly transforms a Unicode string into an ASCII
3232+ string. ASCII characters in the Unicode string are represented
3333+ literally, and non-ASCII characters are represented by ASCII
3434+ characters that are allowed in host name labels (letters, digits, and
3535+ hyphens). This document defines a general algorithm called
3636+ Bootstring that allows a string of basic code points to uniquely
3737+ represent any string of code points drawn from a larger set.
3838+ Punycode is an instance of Bootstring that uses particular parameter
3939+ values specified by this document, appropriate for IDNA.
4040+4141+Table of Contents
4242+4343+ 1. Introduction...............................................2
4444+ 1.1 Features..............................................2
4545+ 1.2 Interaction of protocol parts.........................3
4646+ 2. Terminology................................................3
4747+ 3. Bootstring description.....................................4
4848+ 3.1 Basic code point segregation..........................4
4949+ 3.2 Insertion unsort coding...............................4
5050+ 3.3 Generalized variable-length integers..................5
5151+ 3.4 Bias adaptation.......................................7
5252+ 4. Bootstring parameters......................................8
5353+ 5. Parameter values for Punycode..............................8
5454+ 6. Bootstring algorithms......................................9
5555+5656+5757+5858+Costello Standards Track [Page 1]
5959+6060+RFC 3492 IDNA Punycode March 2003
6161+6262+6363+ 6.1 Bias adaptation function.............................10
6464+ 6.2 Decoding procedure...................................11
6565+ 6.3 Encoding procedure...................................12
6666+ 6.4 Overflow handling....................................13
6767+ 7. Punycode examples.........................................14
6868+ 7.1 Sample strings.......................................14
6969+ 7.2 Decoding traces......................................17
7070+ 7.3 Encoding traces......................................19
7171+ 8. Security Considerations...................................20
7272+ 9. References................................................21
7373+ 9.1 Normative References.................................21
7474+ 9.2 Informative References...............................21
7575+ A. Mixed-case annotation.....................................22
7676+ B. Disclaimer and license....................................22
7777+ C. Punycode sample implementation............................23
7878+ Author's Address.............................................34
7979+ Full Copyright Statement.....................................35
8080+8181+1. Introduction
8282+8383+ [IDNA] describes an architecture for supporting internationalized
8484+ domain names. Labels containing non-ASCII characters can be
8585+ represented by ACE labels, which begin with a special ACE prefix and
8686+ contain only ASCII characters. The remainder of the label after the
8787+ prefix is a Punycode encoding of a Unicode string satisfying certain
8888+ constraints. For the details of the prefix and constraints, see
8989+ [IDNA] and [NAMEPREP].
9090+9191+ Punycode is an instance of a more general algorithm called
9292+ Bootstring, which allows strings composed from a small set of "basic"
9393+ code points to uniquely represent any string of code points drawn
9494+ from a larger set. Punycode is Bootstring with particular parameter
9595+ values appropriate for IDNA.
9696+9797+1.1 Features
9898+9999+ Bootstring has been designed to have the following features:
100100+101101+ * Completeness: Every extended string (sequence of arbitrary code
102102+ points) can be represented by a basic string (sequence of basic
103103+ code points). Restrictions on what strings are allowed, and on
104104+ length, can be imposed by higher layers.
105105+106106+ * Uniqueness: There is at most one basic string that represents a
107107+ given extended string.
108108+109109+ * Reversibility: Any extended string mapped to a basic string can
110110+ be recovered from that basic string.
111111+112112+113113+114114+Costello Standards Track [Page 2]
115115+116116+RFC 3492 IDNA Punycode March 2003
117117+118118+119119+ * Efficient encoding: The ratio of basic string length to extended
120120+ string length is small. This is important in the context of
121121+ domain names because RFC 1034 [RFC1034] restricts the length of a
122122+ domain label to 63 characters.
123123+124124+ * Simplicity: The encoding and decoding algorithms are reasonably
125125+ simple to implement. The goals of efficiency and simplicity are
126126+ at odds; Bootstring aims at a good balance between them.
127127+128128+ * Readability: Basic code points appearing in the extended string
129129+ are represented as themselves in the basic string (although the
130130+ main purpose is to improve efficiency, not readability).
131131+132132+ Punycode can also support an additional feature that is not used by
133133+ the ToASCII and ToUnicode operations of [IDNA]. When extended
134134+ strings are case-folded prior to encoding, the basic string can use
135135+ mixed case to tell how to convert the folded string into a mixed-case
136136+ string. See appendix A "Mixed-case annotation".
137137+138138+1.2 Interaction of protocol parts
139139+140140+ Punycode is used by the IDNA protocol [IDNA] for converting domain
141141+ labels into ASCII; it is not designed for any other purpose. It is
142142+ explicitly not designed for processing arbitrary free text.
143143+144144+2. Terminology
145145+146146+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
147147+ "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
148148+ document are to be interpreted as described in BCP 14, RFC 2119
149149+ [RFC2119].
150150+151151+ A code point is an integral value associated with a character in a
152152+ coded character set.
153153+154154+ As in the Unicode Standard [UNICODE], Unicode code points are denoted
155155+ by "U+" followed by four to six hexadecimal digits, while a range of
156156+ code points is denoted by two hexadecimal numbers separated by "..",
157157+ with no prefixes.
158158+159159+ The operators div and mod perform integer division; (x div y) is the
160160+ quotient of x divided by y, discarding the remainder, and (x mod y)
161161+ is the remainder, so (x div y) * y + (x mod y) == x. Bootstring uses
162162+ these operators only with nonnegative operands, so the quotient and
163163+ remainder are always nonnegative.
164164+165165+ The break statement jumps out of the innermost loop (as in C).
166166+167167+168168+169169+170170+Costello Standards Track [Page 3]
171171+172172+RFC 3492 IDNA Punycode March 2003
173173+174174+175175+ An overflow is an attempt to compute a value that exceeds the maximum
176176+ value of an integer variable.
177177+178178+3. Bootstring description
179179+180180+ Bootstring represents an arbitrary sequence of code points (the
181181+ "extended string") as a sequence of basic code points (the "basic
182182+ string"). This section describes the representation. Section 6
183183+ "Bootstring algorithms" presents the algorithms as pseudocode.
184184+ Sections 7.1 "Decoding traces" and 7.2 "Encoding traces" trace the
185185+ algorithms for sample inputs.
186186+187187+ The following sections describe the four techniques used in
188188+ Bootstring. "Basic code point segregation" is a very simple and
189189+ efficient encoding for basic code points occurring in the extended
190190+ string: they are simply copied all at once. "Insertion unsort
191191+ coding" encodes the non-basic code points as deltas, and processes
192192+ the code points in numerical order rather than in order of
193193+ appearance, which typically results in smaller deltas. The deltas
194194+ are represented as "generalized variable-length integers", which use
195195+ basic code points to represent nonnegative integers. The parameters
196196+ of this integer representation are dynamically adjusted using "bias
197197+ adaptation", to improve efficiency when consecutive deltas have
198198+ similar magnitudes.
199199+200200+3.1 Basic code point segregation
201201+202202+ All basic code points appearing in the extended string are
203203+ represented literally at the beginning of the basic string, in their
204204+ original order, followed by a delimiter if (and only if) the number
205205+ of basic code points is nonzero. The delimiter is a particular basic
206206+ code point, which never appears in the remainder of the basic string.
207207+ The decoder can therefore find the end of the literal portion (if
208208+ there is one) by scanning for the last delimiter.
209209+210210+3.2 Insertion unsort coding
211211+212212+ The remainder of the basic string (after the last delimiter if there
213213+ is one) represents a sequence of nonnegative integral deltas as
214214+ generalized variable-length integers, described in section 3.3. The
215215+ meaning of the deltas is best understood in terms of the decoder.
216216+217217+ The decoder builds the extended string incrementally. Initially, the
218218+ extended string is a copy of the literal portion of the basic string
219219+ (excluding the last delimiter). The decoder inserts non-basic code
220220+ points, one for each delta, into the extended string, ultimately
221221+ arriving at the final decoded string.
222222+223223+224224+225225+226226+Costello Standards Track [Page 4]
227227+228228+RFC 3492 IDNA Punycode March 2003
229229+230230+231231+ At the heart of this process is a state machine with two state
232232+ variables: an index i and a counter n. The index i refers to a
233233+ position in the extended string; it ranges from 0 (the first
234234+ position) to the current length of the extended string (which refers
235235+ to a potential position beyond the current end). If the current
236236+ state is <n,i>, the next state is <n,i+1> if i is less than the
237237+ length of the extended string, or <n+1,0> if i equals the length of
238238+ the extended string. In other words, each state change causes i to
239239+ increment, wrapping around to zero if necessary, and n counts the
240240+ number of wrap-arounds.
241241+242242+ Notice that the state always advances monotonically (there is no way
243243+ for the decoder to return to an earlier state). At each state, an
244244+ insertion is either performed or not performed. At most one
245245+ insertion is performed in a given state. An insertion inserts the
246246+ value of n at position i in the extended string. The deltas are a
247247+ run-length encoding of this sequence of events: they are the lengths
248248+ of the runs of non-insertion states preceeding the insertion states.
249249+ Hence, for each delta, the decoder performs delta state changes, then
250250+ an insertion, and then one more state change. (An implementation
251251+ need not perform each state change individually, but can instead use
252252+ division and remainder calculations to compute the next insertion
253253+ state directly.) It is an error if the inserted code point is a
254254+ basic code point (because basic code points were supposed to be
255255+ segregated as described in section 3.1).
256256+257257+ The encoder's main task is to derive the sequence of deltas that will
258258+ cause the decoder to construct the desired string. It can do this by
259259+ repeatedly scanning the extended string for the next code point that
260260+ the decoder would need to insert, and counting the number of state
261261+ changes the decoder would need to perform, mindful of the fact that
262262+ the decoder's extended string will include only those code points
263263+ that have already been inserted. Section 6.3 "Encoding procedure"
264264+ gives a precise algorithm.
265265+266266+3.3 Generalized variable-length integers
267267+268268+ In a conventional integer representation the base is the number of
269269+ distinct symbols for digits, whose values are 0 through base-1. Let
270270+ digit_0 denote the least significant digit, digit_1 the next least
271271+ significant, and so on. The value represented is the sum over j of
272272+ digit_j * w(j), where w(j) = base^j is the weight (scale factor) for
273273+ position j. For example, in the base 8 integer 437, the digits are
274274+ 7, 3, and 4, and the weights are 1, 8, and 64, so the value is 7 +
275275+ 3*8 + 4*64 = 287. This representation has two disadvantages: First,
276276+ there are multiple encodings of each value (because there can be
277277+ extra zeros in the most significant positions), which is inconvenient
278278+279279+280280+281281+282282+Costello Standards Track [Page 5]
283283+284284+RFC 3492 IDNA Punycode March 2003
285285+286286+287287+ when unique encodings are needed. Second, the integer is not self-
288288+ delimiting, so if multiple integers are concatenated the boundaries
289289+ between them are lost.
290290+291291+ The generalized variable-length representation solves these two
292292+ problems. The digit values are still 0 through base-1, but now the
293293+ integer is self-delimiting by means of thresholds t(j), each of which
294294+ is in the range 0 through base-1. Exactly one digit, the most
295295+ significant, satisfies digit_j < t(j). Therefore, if several
296296+ integers are concatenated, it is easy to separate them, starting with
297297+ the first if they are little-endian (least significant digit first),
298298+ or starting with the last if they are big-endian (most significant
299299+ digit first). As before, the value is the sum over j of digit_j *
300300+ w(j), but the weights are different:
301301+302302+ w(0) = 1
303303+ w(j) = w(j-1) * (base - t(j-1)) for j > 0
304304+305305+ For example, consider the little-endian sequence of base 8 digits
306306+ 734251... Suppose the thresholds are 2, 3, 5, 5, 5, 5... This
307307+ implies that the weights are 1, 1*(8-2) = 6, 6*(8-3) = 30, 30*(8-5) =
308308+ 90, 90*(8-5) = 270, and so on. 7 is not less than 2, and 3 is not
309309+ less than 3, but 4 is less than 5, so 4 is the last digit. The value
310310+ of 734 is 7*1 + 3*6 + 4*30 = 145. The next integer is 251, with
311311+ value 2*1 + 5*6 + 1*30 = 62. Decoding this representation is very
312312+ similar to decoding a conventional integer: Start with a current
313313+ value of N = 0 and a weight w = 1. Fetch the next digit d and
314314+ increase N by d * w. If d is less than the current threshold (t)
315315+ then stop, otherwise increase w by a factor of (base - t), update t
316316+ for the next position, and repeat.
317317+318318+ Encoding this representation is similar to encoding a conventional
319319+ integer: If N < t then output one digit for N and stop, otherwise
320320+ output the digit for t + ((N - t) mod (base - t)), then replace N
321321+ with (N - t) div (base - t), update t for the next position, and
322322+ repeat.
323323+324324+ For any particular set of values of t(j), there is exactly one
325325+ generalized variable-length representation of each nonnegative
326326+ integral value.
327327+328328+ Bootstring uses little-endian ordering so that the deltas can be
329329+ separated starting with the first. The t(j) values are defined in
330330+ terms of the constants base, tmin, and tmax, and a state variable
331331+ called bias:
332332+333333+ t(j) = base * (j + 1) - bias,
334334+ clamped to the range tmin through tmax
335335+336336+337337+338338+Costello Standards Track [Page 6]
339339+340340+RFC 3492 IDNA Punycode March 2003
341341+342342+343343+ The clamping means that if the formula yields a value less than tmin
344344+ or greater than tmax, then t(j) = tmin or tmax, respectively. (In
345345+ the pseudocode in section 6 "Bootstring algorithms", the expression
346346+ base * (j + 1) is denoted by k for performance reasons.) These t(j)
347347+ values cause the representation to favor integers within a particular
348348+ range determined by the bias.
349349+350350+3.4 Bias adaptation
351351+352352+ After each delta is encoded or decoded, bias is set for the next
353353+ delta as follows:
354354+355355+ 1. Delta is scaled in order to avoid overflow in the next step:
356356+357357+ let delta = delta div 2
358358+359359+ But when this is the very first delta, the divisor is not 2, but
360360+ instead a constant called damp. This compensates for the fact
361361+ that the second delta is usually much smaller than the first.
362362+363363+ 2. Delta is increased to compensate for the fact that the next delta
364364+ will be inserting into a longer string:
365365+366366+ let delta = delta + (delta div numpoints)
367367+368368+ numpoints is the total number of code points encoded/decoded so
369369+ far (including the one corresponding to this delta itself, and
370370+ including the basic code points).
371371+372372+ 3. Delta is repeatedly divided until it falls within a threshold, to
373373+ predict the minimum number of digits needed to represent the next
374374+ delta:
375375+376376+ while delta > ((base - tmin) * tmax) div 2
377377+ do let delta = delta div (base - tmin)
378378+379379+ 4. The bias is set:
380380+381381+ let bias =
382382+ (base * the number of divisions performed in step 3) +
383383+ (((base - tmin + 1) * delta) div (delta + skew))
384384+385385+ The motivation for this procedure is that the current delta
386386+ provides a hint about the likely size of the next delta, and so
387387+ t(j) is set to tmax for the more significant digits starting with
388388+ the one expected to be last, tmin for the less significant digits
389389+ up through the one expected to be third-last, and somewhere
390390+ between tmin and tmax for the digit expected to be second-last
391391+392392+393393+394394+Costello Standards Track [Page 7]
395395+396396+RFC 3492 IDNA Punycode March 2003
397397+398398+399399+ (balancing the hope of the expected-last digit being unnecessary
400400+ against the danger of it being insufficient).
401401+402402+4. Bootstring parameters
403403+404404+ Given a set of basic code points, one needs to be designated as the
405405+ delimiter. The base cannot be greater than the number of
406406+ distinguishable basic code points remaining. The digit-values in the
407407+ range 0 through base-1 need to be associated with distinct non-
408408+ delimiter basic code points. In some cases multiple code points need
409409+ to have the same digit-value; for example, uppercase and lowercase
410410+ versions of the same letter need to be equivalent if basic strings
411411+ are case-insensitive.
412412+413413+ The initial value of n cannot be greater than the minimum non-basic
414414+ code point that could appear in extended strings.
415415+416416+ The remaining five parameters (tmin, tmax, skew, damp, and the
417417+ initial value of bias) need to satisfy the following constraints:
418418+419419+ 0 <= tmin <= tmax <= base-1
420420+ skew >= 1
421421+ damp >= 2
422422+ initial_bias mod base <= base - tmin
423423+424424+ Provided the constraints are satisfied, these five parameters affect
425425+ efficiency but not correctness. They are best chosen empirically.
426426+427427+ If support for mixed-case annotation is desired (see appendix A),
428428+ make sure that the code points corresponding to 0 through tmax-1 all
429429+ have both uppercase and lowercase forms.
430430+431431+5. Parameter values for Punycode
432432+433433+ Punycode uses the following Bootstring parameter values:
434434+435435+ base = 36
436436+ tmin = 1
437437+ tmax = 26
438438+ skew = 38
439439+ damp = 700
440440+ initial_bias = 72
441441+ initial_n = 128 = 0x80
442442+443443+ Although the only restriction Punycode imposes on the input integers
444444+ is that they be nonnegative, these parameters are especially designed
445445+ to work well with Unicode [UNICODE] code points, which are integers
446446+ in the range 0..10FFFF (but not D800..DFFF, which are reserved for
447447+448448+449449+450450+Costello Standards Track [Page 8]
451451+452452+RFC 3492 IDNA Punycode March 2003
453453+454454+455455+ use by the UTF-16 encoding of Unicode). The basic code points are
456456+ the ASCII [ASCII] code points (0..7F), of which U+002D (-) is the
457457+ delimiter, and some of the others have digit-values as follows:
458458+459459+ code points digit-values
460460+ ------------ ----------------------
461461+ 41..5A (A-Z) = 0 to 25, respectively
462462+ 61..7A (a-z) = 0 to 25, respectively
463463+ 30..39 (0-9) = 26 to 35, respectively
464464+465465+ Using hyphen-minus as the delimiter implies that the encoded string
466466+ can end with a hyphen-minus only if the Unicode string consists
467467+ entirely of basic code points, but IDNA forbids such strings from
468468+ being encoded. The encoded string can begin with a hyphen-minus, but
469469+ IDNA prepends a prefix. Therefore IDNA using Punycode conforms to
470470+ the RFC 952 rule that host name labels neither begin nor end with a
471471+ hyphen-minus [RFC952].
472472+473473+ A decoder MUST recognize the letters in both uppercase and lowercase
474474+ forms (including mixtures of both forms). An encoder SHOULD output
475475+ only uppercase forms or only lowercase forms, unless it uses mixed-
476476+ case annotation (see appendix A).
477477+478478+ Presumably most users will not manually write or type encoded strings
479479+ (as opposed to cutting and pasting them), but those who do will need
480480+ to be alert to the potential visual ambiguity between the following
481481+ sets of characters:
482482+483483+ G 6
484484+ I l 1
485485+ O 0
486486+ S 5
487487+ U V
488488+ Z 2
489489+490490+ Such ambiguities are usually resolved by context, but in a Punycode
491491+ encoded string there is no context apparent to humans.
492492+493493+6. Bootstring algorithms
494494+495495+ Some parts of the pseudocode can be omitted if the parameters satisfy
496496+ certain conditions (for which Punycode qualifies). These parts are
497497+ enclosed in {braces}, and notes immediately following the pseudocode
498498+ explain the conditions under which they can be omitted.
499499+500500+501501+502502+503503+504504+505505+506506+Costello Standards Track [Page 9]
507507+508508+RFC 3492 IDNA Punycode March 2003
509509+510510+511511+ Formally, code points are integers, and hence the pseudocode assumes
512512+ that arithmetic operations can be performed directly on code points.
513513+ In some programming languages, explicit conversion between code
514514+ points and integers might be necessary.
515515+516516+6.1 Bias adaptation function
517517+518518+ function adapt(delta,numpoints,firsttime):
519519+ if firsttime then let delta = delta div damp
520520+ else let delta = delta div 2
521521+ let delta = delta + (delta div numpoints)
522522+ let k = 0
523523+ while delta > ((base - tmin) * tmax) div 2 do begin
524524+ let delta = delta div (base - tmin)
525525+ let k = k + base
526526+ end
527527+ return k + (((base - tmin + 1) * delta) div (delta + skew))
528528+529529+ It does not matter whether the modifications to delta and k inside
530530+ adapt() affect variables of the same name inside the
531531+ encoding/decoding procedures, because after calling adapt() the
532532+ caller does not read those variables before overwriting them.
533533+534534+535535+536536+537537+538538+539539+540540+541541+542542+543543+544544+545545+546546+547547+548548+549549+550550+551551+552552+553553+554554+555555+556556+557557+558558+559559+560560+561561+562562+Costello Standards Track [Page 10]
563563+564564+RFC 3492 IDNA Punycode March 2003
565565+566566+567567+6.2 Decoding procedure
568568+569569+ let n = initial_n
570570+ let i = 0
571571+ let bias = initial_bias
572572+ let output = an empty string indexed from 0
573573+ consume all code points before the last delimiter (if there is one)
574574+ and copy them to output, fail on any non-basic code point
575575+ if more than zero code points were consumed then consume one more
576576+ (which will be the last delimiter)
577577+ while the input is not exhausted do begin
578578+ let oldi = i
579579+ let w = 1
580580+ for k = base to infinity in steps of base do begin
581581+ consume a code point, or fail if there was none to consume
582582+ let digit = the code point's digit-value, fail if it has none
583583+ let i = i + digit * w, fail on overflow
584584+ let t = tmin if k <= bias {+ tmin}, or
585585+ tmax if k >= bias + tmax, or k - bias otherwise
586586+ if digit < t then break
587587+ let w = w * (base - t), fail on overflow
588588+ end
589589+ let bias = adapt(i - oldi, length(output) + 1, test oldi is 0?)
590590+ let n = n + i div (length(output) + 1), fail on overflow
591591+ let i = i mod (length(output) + 1)
592592+ {if n is a basic code point then fail}
593593+ insert n into output at position i
594594+ increment i
595595+ end
596596+597597+ The full statement enclosed in braces (checking whether n is a basic
598598+ code point) can be omitted if initial_n exceeds all basic code points
599599+ (which is true for Punycode), because n is never less than initial_n.
600600+601601+ In the assignment of t, where t is clamped to the range tmin through
602602+ tmax, "+ tmin" can always be omitted. This makes the clamping
603603+ calculation incorrect when bias < k < bias + tmin, but that cannot
604604+ happen because of the way bias is computed and because of the
605605+ constraints on the parameters.
606606+607607+ Because the decoder state can only advance monotonically, and there
608608+ is only one representation of any delta, there is therefore only one
609609+ encoded string that can represent a given sequence of integers. The
610610+ only error conditions are invalid code points, unexpected end-of-
611611+ input, overflow, and basic code points encoded using deltas instead
612612+ of appearing literally. If the decoder fails on these errors as
613613+ shown above, then it cannot produce the same output for two distinct
614614+ inputs. Without this property it would have been necessary to re-
615615+616616+617617+618618+Costello Standards Track [Page 11]
619619+620620+RFC 3492 IDNA Punycode March 2003
621621+622622+623623+ encode the output and verify that it matches the input in order to
624624+ guarantee the uniqueness of the encoding.
625625+626626+6.3 Encoding procedure
627627+628628+ let n = initial_n
629629+ let delta = 0
630630+ let bias = initial_bias
631631+ let h = b = the number of basic code points in the input
632632+ copy them to the output in order, followed by a delimiter if b > 0
633633+ {if the input contains a non-basic code point < n then fail}
634634+ while h < length(input) do begin
635635+ let m = the minimum {non-basic} code point >= n in the input
636636+ let delta = delta + (m - n) * (h + 1), fail on overflow
637637+ let n = m
638638+ for each code point c in the input (in order) do begin
639639+ if c < n {or c is basic} then increment delta, fail on overflow
640640+ if c == n then begin
641641+ let q = delta
642642+ for k = base to infinity in steps of base do begin
643643+ let t = tmin if k <= bias {+ tmin}, or
644644+ tmax if k >= bias + tmax, or k - bias otherwise
645645+ if q < t then break
646646+ output the code point for digit t + ((q - t) mod (base - t))
647647+ let q = (q - t) div (base - t)
648648+ end
649649+ output the code point for digit q
650650+ let bias = adapt(delta, h + 1, test h equals b?)
651651+ let delta = 0
652652+ increment h
653653+ end
654654+ end
655655+ increment delta and n
656656+ end
657657+658658+ The full statement enclosed in braces (checking whether the input
659659+ contains a non-basic code point less than n) can be omitted if all
660660+ code points less than initial_n are basic code points (which is true
661661+ for Punycode if code points are unsigned).
662662+663663+ The brace-enclosed conditions "non-basic" and "or c is basic" can be
664664+ omitted if initial_n exceeds all basic code points (which is true for
665665+ Punycode), because the code point being tested is never less than
666666+ initial_n.
667667+668668+ In the assignment of t, where t is clamped to the range tmin through
669669+ tmax, "+ tmin" can always be omitted. This makes the clamping
670670+ calculation incorrect when bias < k < bias + tmin, but that cannot
671671+672672+673673+674674+Costello Standards Track [Page 12]
675675+676676+RFC 3492 IDNA Punycode March 2003
677677+678678+679679+ happen because of the way bias is computed and because of the
680680+ constraints on the parameters.
681681+682682+ The checks for overflow are necessary to avoid producing invalid
683683+ output when the input contains very large values or is very long.
684684+685685+ The increment of delta at the bottom of the outer loop cannot
686686+ overflow because delta < length(input) before the increment, and
687687+ length(input) is already assumed to be representable. The increment
688688+ of n could overflow, but only if h == length(input), in which case
689689+ the procedure is finished anyway.
690690+691691+6.4 Overflow handling
692692+693693+ For IDNA, 26-bit unsigned integers are sufficient to handle all valid
694694+ IDNA labels without overflow, because any string that needed a 27-bit
695695+ delta would have to exceed either the code point limit (0..10FFFF) or
696696+ the label length limit (63 characters). However, overflow handling
697697+ is necessary because the inputs are not necessarily valid IDNA
698698+ labels.
699699+700700+ If the programming language does not provide overflow detection, the
701701+ following technique can be used. Suppose A, B, and C are
702702+ representable nonnegative integers and C is nonzero. Then A + B
703703+ overflows if and only if B > maxint - A, and A + (B * C) overflows if
704704+ and only if B > (maxint - A) div C, where maxint is the greatest
705705+ integer for which maxint + 1 cannot be represented. Refer to
706706+ appendix C "Punycode sample implementation" for demonstrations of
707707+ this technique in the C language.
708708+709709+ The decoding and encoding algorithms shown in sections 6.2 and 6.3
710710+ handle overflow by detecting it whenever it happens. Another
711711+ approach is to enforce limits on the inputs that prevent overflow
712712+ from happening. For example, if the encoder were to verify that no
713713+ input code points exceed M and that the input length does not exceed
714714+ L, then no delta could ever exceed (M - initial_n) * (L + 1), and
715715+ hence no overflow could occur if integer variables were capable of
716716+ representing values that large. This prevention approach would
717717+ impose more restrictions on the input than the detection approach
718718+ does, but might be considered simpler in some programming languages.
719719+720720+ In theory, the decoder could use an analogous approach, limiting the
721721+ number of digits in a variable-length integer (that is, limiting the
722722+ number of iterations in the innermost loop). However, the number of
723723+ digits that suffice to represent a given delta can sometimes
724724+ represent much larger deltas (because of the adaptation), and hence
725725+ this approach would probably need integers wider than 32 bits.
726726+727727+728728+729729+730730+Costello Standards Track [Page 13]
731731+732732+RFC 3492 IDNA Punycode March 2003
733733+734734+735735+ Yet another approach for the decoder is to allow overflow to occur,
736736+ but to check the final output string by re-encoding it and comparing
737737+ to the decoder input. If and only if they do not match (using a
738738+ case-insensitive ASCII comparison) overflow has occurred. This
739739+ delayed-detection approach would not impose any more restrictions on
740740+ the input than the immediate-detection approach does, and might be
741741+ considered simpler in some programming languages.
742742+743743+ In fact, if the decoder is used only inside the IDNA ToUnicode
744744+ operation [IDNA], then it need not check for overflow at all, because
745745+ ToUnicode performs a higher level re-encoding and comparison, and a
746746+ mismatch has the same consequence as if the Punycode decoder had
747747+ failed.
748748+749749+7. Punycode examples
750750+751751+7.1 Sample strings
752752+753753+ In the Punycode encodings below, the ACE prefix is not shown.
754754+ Backslashes show where line breaks have been inserted in strings too
755755+ long for one line.
756756+757757+ The first several examples are all translations of the sentence "Why
758758+ can't they just speak in <language>?" (courtesy of Michael Kaplan's
759759+ "provincial" page [PROVINCIAL]). Word breaks and punctuation have
760760+ been removed, as is often done in domain names.
761761+762762+ (A) Arabic (Egyptian):
763763+ u+0644 u+064A u+0647 u+0645 u+0627 u+0628 u+062A u+0643 u+0644
764764+ u+0645 u+0648 u+0634 u+0639 u+0631 u+0628 u+064A u+061F
765765+ Punycode: egbpdaj6bu4bxfgehfvwxn
766766+767767+ (B) Chinese (simplified):
768768+ u+4ED6 u+4EEC u+4E3A u+4EC0 u+4E48 u+4E0D u+8BF4 u+4E2D u+6587
769769+ Punycode: ihqwcrb4cv8a8dqg056pqjye
770770+771771+ (C) Chinese (traditional):
772772+ u+4ED6 u+5011 u+7232 u+4EC0 u+9EBD u+4E0D u+8AAA u+4E2D u+6587
773773+ Punycode: ihqwctvzc91f659drss3x8bo0yb
774774+775775+ (D) Czech: Pro<ccaron>prost<ecaron>nemluv<iacute><ccaron>esky
776776+ U+0050 u+0072 u+006F u+010D u+0070 u+0072 u+006F u+0073 u+0074
777777+ u+011B u+006E u+0065 u+006D u+006C u+0075 u+0076 u+00ED u+010D
778778+ u+0065 u+0073 u+006B u+0079
779779+ Punycode: Proprostnemluvesky-uyb24dma41a
780780+781781+782782+783783+784784+785785+786786+Costello Standards Track [Page 14]
787787+788788+RFC 3492 IDNA Punycode March 2003
789789+790790+791791+ (E) Hebrew:
792792+ u+05DC u+05DE u+05D4 u+05D4 u+05DD u+05E4 u+05E9 u+05D5 u+05D8
793793+ u+05DC u+05D0 u+05DE u+05D3 u+05D1 u+05E8 u+05D9 u+05DD u+05E2
794794+ u+05D1 u+05E8 u+05D9 u+05EA
795795+ Punycode: 4dbcagdahymbxekheh6e0a7fei0b
796796+797797+ (F) Hindi (Devanagari):
798798+ u+092F u+0939 u+0932 u+094B u+0917 u+0939 u+093F u+0928 u+094D
799799+ u+0926 u+0940 u+0915 u+094D u+092F u+094B u+0902 u+0928 u+0939
800800+ u+0940 u+0902 u+092C u+094B u+0932 u+0938 u+0915 u+0924 u+0947
801801+ u+0939 u+0948 u+0902
802802+ Punycode: i1baa7eci9glrd9b2ae1bj0hfcgg6iyaf8o0a1dig0cd
803803+804804+ (G) Japanese (kanji and hiragana):
805805+ u+306A u+305C u+307F u+3093 u+306A u+65E5 u+672C u+8A9E u+3092
806806+ u+8A71 u+3057 u+3066 u+304F u+308C u+306A u+3044 u+306E u+304B
807807+ Punycode: n8jok5ay5dzabd5bym9f0cm5685rrjetr6pdxa
808808+809809+ (H) Korean (Hangul syllables):
810810+ u+C138 u+ACC4 u+C758 u+BAA8 u+B4E0 u+C0AC u+B78C u+B4E4 u+C774
811811+ u+D55C u+AD6D u+C5B4 u+B97C u+C774 u+D574 u+D55C u+B2E4 u+BA74
812812+ u+C5BC u+B9C8 u+B098 u+C88B u+C744 u+AE4C
813813+ Punycode: 989aomsvi5e83db1d2a355cv1e0vak1dwrv93d5xbh15a0dt30a5j\
814814+ psd879ccm6fea98c
815815+816816+ (I) Russian (Cyrillic):
817817+ U+043F u+043E u+0447 u+0435 u+043C u+0443 u+0436 u+0435 u+043E
818818+ u+043D u+0438 u+043D u+0435 u+0433 u+043E u+0432 u+043E u+0440
819819+ u+044F u+0442 u+043F u+043E u+0440 u+0443 u+0441 u+0441 u+043A
820820+ u+0438
821821+ Punycode: b1abfaaepdrnnbgefbaDotcwatmq2g4l
822822+823823+ (J) Spanish: Porqu<eacute>nopuedensimplementehablarenEspa<ntilde>ol
824824+ U+0050 u+006F u+0072 u+0071 u+0075 u+00E9 u+006E u+006F u+0070
825825+ u+0075 u+0065 u+0064 u+0065 u+006E u+0073 u+0069 u+006D u+0070
826826+ u+006C u+0065 u+006D u+0065 u+006E u+0074 u+0065 u+0068 u+0061
827827+ u+0062 u+006C u+0061 u+0072 u+0065 u+006E U+0045 u+0073 u+0070
828828+ u+0061 u+00F1 u+006F u+006C
829829+ Punycode: PorqunopuedensimplementehablarenEspaol-fmd56a
830830+831831+ (K) Vietnamese:
832832+ T<adotbelow>isaoh<odotbelow>kh<ocirc>ngth<ecirchookabove>ch\
833833+ <ihookabove>n<oacute>iti<ecircacute>ngVi<ecircdotbelow>t
834834+ U+0054 u+1EA1 u+0069 u+0073 u+0061 u+006F u+0068 u+1ECD u+006B
835835+ u+0068 u+00F4 u+006E u+0067 u+0074 u+0068 u+1EC3 u+0063 u+0068
836836+ u+1EC9 u+006E u+00F3 u+0069 u+0074 u+0069 u+1EBF u+006E u+0067
837837+ U+0056 u+0069 u+1EC7 u+0074
838838+ Punycode: TisaohkhngthchnitingVit-kjcr8268qyxafd2f1b9g
839839+840840+841841+842842+Costello Standards Track [Page 15]
843843+844844+RFC 3492 IDNA Punycode March 2003
845845+846846+847847+ The next several examples are all names of Japanese music artists,
848848+ song titles, and TV programs, just because the author happens to have
849849+ them handy (but Japanese is useful for providing examples of single-
850850+ row text, two-row text, ideographic text, and various mixtures
851851+ thereof).
852852+853853+ (L) 3<nen>B<gumi><kinpachi><sensei>
854854+ u+0033 u+5E74 U+0042 u+7D44 u+91D1 u+516B u+5148 u+751F
855855+ Punycode: 3B-ww4c5e180e575a65lsy2b
856856+857857+ (M) <amuro><namie>-with-SUPER-MONKEYS
858858+ u+5B89 u+5BA4 u+5948 u+7F8E u+6075 u+002D u+0077 u+0069 u+0074
859859+ u+0068 u+002D U+0053 U+0055 U+0050 U+0045 U+0052 u+002D U+004D
860860+ U+004F U+004E U+004B U+0045 U+0059 U+0053
861861+ Punycode: -with-SUPER-MONKEYS-pc58ag80a8qai00g7n9n
862862+863863+ (N) Hello-Another-Way-<sorezore><no><basho>
864864+ U+0048 u+0065 u+006C u+006C u+006F u+002D U+0041 u+006E u+006F
865865+ u+0074 u+0068 u+0065 u+0072 u+002D U+0057 u+0061 u+0079 u+002D
866866+ u+305D u+308C u+305E u+308C u+306E u+5834 u+6240
867867+ Punycode: Hello-Another-Way--fc4qua05auwb3674vfr0b
868868+869869+ (O) <hitotsu><yane><no><shita>2
870870+ u+3072 u+3068 u+3064 u+5C4B u+6839 u+306E u+4E0B u+0032
871871+ Punycode: 2-u9tlzr9756bt3uc0v
872872+873873+ (P) Maji<de>Koi<suru>5<byou><mae>
874874+ U+004D u+0061 u+006A u+0069 u+3067 U+004B u+006F u+0069 u+3059
875875+ u+308B u+0035 u+79D2 u+524D
876876+ Punycode: MajiKoi5-783gue6qz075azm5e
877877+878878+ (Q) <pafii>de<runba>
879879+ u+30D1 u+30D5 u+30A3 u+30FC u+0064 u+0065 u+30EB u+30F3 u+30D0
880880+ Punycode: de-jg4avhby1noc0d
881881+882882+ (R) <sono><supiido><de>
883883+ u+305D u+306E u+30B9 u+30D4 u+30FC u+30C9 u+3067
884884+ Punycode: d9juau41awczczp
885885+886886+ The last example is an ASCII string that breaks the existing rules
887887+ for host name labels. (It is not a realistic example for IDNA,
888888+ because IDNA never encodes pure ASCII labels.)
889889+890890+ (S) -> $1.00 <-
891891+ u+002D u+003E u+0020 u+0024 u+0031 u+002E u+0030 u+0030 u+0020
892892+ u+003C u+002D
893893+ Punycode: -> $1.00 <--
894894+895895+896896+897897+898898+Costello Standards Track [Page 16]
899899+900900+RFC 3492 IDNA Punycode March 2003
901901+902902+903903+7.2 Decoding traces
904904+905905+ In the following traces, the evolving state of the decoder is shown
906906+ as a sequence of hexadecimal values, representing the code points in
907907+ the extended string. An asterisk appears just after the most
908908+ recently inserted code point, indicating both n (the value preceeding
909909+ the asterisk) and i (the position of the value just after the
910910+ asterisk). Other numerical values are decimal.
911911+912912+ Decoding trace of example B from section 7.1:
913913+914914+ n is 128, i is 0, bias is 72
915915+ input is "ihqwcrb4cv8a8dqg056pqjye"
916916+ there is no delimiter, so extended string starts empty
917917+ delta "ihq" decodes to 19853
918918+ bias becomes 21
919919+ 4E0D *
920920+ delta "wc" decodes to 64
921921+ bias becomes 20
922922+ 4E0D 4E2D *
923923+ delta "rb" decodes to 37
924924+ bias becomes 13
925925+ 4E3A * 4E0D 4E2D
926926+ delta "4c" decodes to 56
927927+ bias becomes 17
928928+ 4E3A 4E48 * 4E0D 4E2D
929929+ delta "v8a" decodes to 599
930930+ bias becomes 32
931931+ 4E3A 4EC0 * 4E48 4E0D 4E2D
932932+ delta "8d" decodes to 130
933933+ bias becomes 23
934934+ 4ED6 * 4E3A 4EC0 4E48 4E0D 4E2D
935935+ delta "qg" decodes to 154
936936+ bias becomes 25
937937+ 4ED6 4EEC * 4E3A 4EC0 4E48 4E0D 4E2D
938938+ delta "056p" decodes to 46301
939939+ bias becomes 84
940940+ 4ED6 4EEC 4E3A 4EC0 4E48 4E0D 4E2D 6587 *
941941+ delta "qjye" decodes to 88531
942942+ bias becomes 90
943943+ 4ED6 4EEC 4E3A 4EC0 4E48 4E0D 8BF4 * 4E2D 6587
944944+945945+946946+947947+948948+949949+950950+951951+952952+953953+954954+Costello Standards Track [Page 17]
955955+956956+RFC 3492 IDNA Punycode March 2003
957957+958958+959959+ Decoding trace of example L from section 7.1:
960960+961961+ n is 128, i is 0, bias is 72
962962+ input is "3B-ww4c5e180e575a65lsy2b"
963963+ literal portion is "3B-", so extended string starts as:
964964+ 0033 0042
965965+ delta "ww4c" decodes to 62042
966966+ bias becomes 27
967967+ 0033 0042 5148 *
968968+ delta "5e" decodes to 139
969969+ bias becomes 24
970970+ 0033 0042 516B * 5148
971971+ delta "180e" decodes to 16683
972972+ bias becomes 67
973973+ 0033 5E74 * 0042 516B 5148
974974+ delta "575a" decodes to 34821
975975+ bias becomes 82
976976+ 0033 5E74 0042 516B 5148 751F *
977977+ delta "65l" decodes to 14592
978978+ bias becomes 67
979979+ 0033 5E74 0042 7D44 * 516B 5148 751F
980980+ delta "sy2b" decodes to 42088
981981+ bias becomes 84
982982+ 0033 5E74 0042 7D44 91D1 * 516B 5148 751F
983983+984984+985985+986986+987987+988988+989989+990990+991991+992992+993993+994994+995995+996996+997997+998998+999999+10001000+10011001+10021002+10031003+10041004+10051005+10061006+10071007+10081008+10091009+10101010+Costello Standards Track [Page 18]
10111011+10121012+RFC 3492 IDNA Punycode March 2003
10131013+10141014+10151015+7.3 Encoding traces
10161016+10171017+ In the following traces, code point values are hexadecimal, while
10181018+ other numerical values are decimal.
10191019+10201020+ Encoding trace of example B from section 7.1:
10211021+10221022+ bias is 72
10231023+ input is:
10241024+ 4ED6 4EEC 4E3A 4EC0 4E48 4E0D 8BF4 4E2D 6587
10251025+ there are no basic code points, so no literal portion
10261026+ next code point to insert is 4E0D
10271027+ needed delta is 19853, encodes as "ihq"
10281028+ bias becomes 21
10291029+ next code point to insert is 4E2D
10301030+ needed delta is 64, encodes as "wc"
10311031+ bias becomes 20
10321032+ next code point to insert is 4E3A
10331033+ needed delta is 37, encodes as "rb"
10341034+ bias becomes 13
10351035+ next code point to insert is 4E48
10361036+ needed delta is 56, encodes as "4c"
10371037+ bias becomes 17
10381038+ next code point to insert is 4EC0
10391039+ needed delta is 599, encodes as "v8a"
10401040+ bias becomes 32
10411041+ next code point to insert is 4ED6
10421042+ needed delta is 130, encodes as "8d"
10431043+ bias becomes 23
10441044+ next code point to insert is 4EEC
10451045+ needed delta is 154, encodes as "qg"
10461046+ bias becomes 25
10471047+ next code point to insert is 6587
10481048+ needed delta is 46301, encodes as "056p"
10491049+ bias becomes 84
10501050+ next code point to insert is 8BF4
10511051+ needed delta is 88531, encodes as "qjye"
10521052+ bias becomes 90
10531053+ output is "ihqwcrb4cv8a8dqg056pqjye"
10541054+10551055+10561056+10571057+10581058+10591059+10601060+10611061+10621062+10631063+10641064+10651065+10661066+Costello Standards Track [Page 19]
10671067+10681068+RFC 3492 IDNA Punycode March 2003
10691069+10701070+10711071+ Encoding trace of example L from section 7.1:
10721072+10731073+ bias is 72
10741074+ input is:
10751075+ 0033 5E74 0042 7D44 91D1 516B 5148 751F
10761076+ basic code points (0033, 0042) are copied to literal portion: "3B-"
10771077+ next code point to insert is 5148
10781078+ needed delta is 62042, encodes as "ww4c"
10791079+ bias becomes 27
10801080+ next code point to insert is 516B
10811081+ needed delta is 139, encodes as "5e"
10821082+ bias becomes 24
10831083+ next code point to insert is 5E74
10841084+ needed delta is 16683, encodes as "180e"
10851085+ bias becomes 67
10861086+ next code point to insert is 751F
10871087+ needed delta is 34821, encodes as "575a"
10881088+ bias becomes 82
10891089+ next code point to insert is 7D44
10901090+ needed delta is 14592, encodes as "65l"
10911091+ bias becomes 67
10921092+ next code point to insert is 91D1
10931093+ needed delta is 42088, encodes as "sy2b"
10941094+ bias becomes 84
10951095+ output is "3B-ww4c5e180e575a65lsy2b"
10961096+10971097+8. Security Considerations
10981098+10991099+ Users expect each domain name in DNS to be controlled by a single
11001100+ authority. If a Unicode string intended for use as a domain label
11011101+ could map to multiple ACE labels, then an internationalized domain
11021102+ name could map to multiple ASCII domain names, each controlled by a
11031103+ different authority, some of which could be spoofs that hijack
11041104+ service requests intended for another. Therefore Punycode is
11051105+ designed so that each Unicode string has a unique encoding.
11061106+11071107+ However, there can still be multiple Unicode representations of the
11081108+ "same" text, for various definitions of "same". This problem is
11091109+ addressed to some extent by the Unicode standard under the topic of
11101110+ canonicalization, and this work is leveraged for domain names by
11111111+ Nameprep [NAMEPREP].
11121112+11131113+11141114+11151115+11161116+11171117+11181118+11191119+11201120+11211121+11221122+Costello Standards Track [Page 20]
11231123+11241124+RFC 3492 IDNA Punycode March 2003
11251125+11261126+11271127+9. References
11281128+11291129+9.1 Normative References
11301130+11311131+ [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
11321132+ Requirement Levels", BCP 14, RFC 2119, March 1997.
11331133+11341134+9.2 Informative References
11351135+11361136+ [RFC952] Harrenstien, K., Stahl, M. and E. Feinler, "DOD Internet
11371137+ Host Table Specification", RFC 952, October 1985.
11381138+11391139+ [RFC1034] Mockapetris, P., "Domain Names - Concepts and
11401140+ Facilities", STD 13, RFC 1034, November 1987.
11411141+11421142+ [IDNA] Faltstrom, P., Hoffman, P. and A. Costello,
11431143+ "Internationalizing Domain Names in Applications
11441144+ (IDNA)", RFC 3490, March 2003.
11451145+11461146+ [NAMEPREP] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep
11471147+ Profile for Internationalized Domain Names (IDN)", RFC
11481148+ 3491, March 2003.
11491149+11501150+ [ASCII] Cerf, V., "ASCII format for Network Interchange", RFC
11511151+ 20, October 1969.
11521152+11531153+ [PROVINCIAL] Kaplan, M., "The 'anyone can be provincial!' page",
11541154+ http://www.trigeminal.com/samples/provincial.html.
11551155+11561156+ [UNICODE] The Unicode Consortium, "The Unicode Standard",
11571157+ http://www.unicode.org/unicode/standard/standard.html.
11581158+11591159+11601160+11611161+11621162+11631163+11641164+11651165+11661166+11671167+11681168+11691169+11701170+11711171+11721172+11731173+11741174+11751175+11761176+11771177+11781178+Costello Standards Track [Page 21]
11791179+11801180+RFC 3492 IDNA Punycode March 2003
11811181+11821182+11831183+A. Mixed-case annotation
11841184+11851185+ In order to use Punycode to represent case-insensitive strings,
11861186+ higher layers need to case-fold the strings prior to Punycode
11871187+ encoding. The encoded string can use mixed case as an annotation
11881188+ telling how to convert the folded string into a mixed-case string for
11891189+ display purposes. Note, however, that mixed-case annotation is not
11901190+ used by the ToASCII and ToUnicode operations specified in [IDNA], and
11911191+ therefore implementors of IDNA can disregard this appendix.
11921192+11931193+ Basic code points can use mixed case directly, because the decoder
11941194+ copies them verbatim, leaving lowercase code points lowercase, and
11951195+ leaving uppercase code points uppercase. Each non-basic code point
11961196+ is represented by a delta, which is represented by a sequence of
11971197+ basic code points, the last of which provides the annotation. If it
11981198+ is uppercase, it is a suggestion to map the non-basic code point to
11991199+ uppercase (if possible); if it is lowercase, it is a suggestion to
12001200+ map the non-basic code point to lowercase (if possible).
12011201+12021202+ These annotations do not alter the code points returned by decoders;
12031203+ the annotations are returned separately, for the caller to use or
12041204+ ignore. Encoders can accept annotations in addition to code points,
12051205+ but the annotations do not alter the output, except to influence the
12061206+ uppercase/lowercase form of ASCII letters.
12071207+12081208+ Punycode encoders and decoders need not support these annotations,
12091209+ and higher layers need not use them.
12101210+12111211+B. Disclaimer and license
12121212+12131213+ Regarding this entire document or any portion of it (including the
12141214+ pseudocode and C code), the author makes no guarantees and is not
12151215+ responsible for any damage resulting from its use. The author grants
12161216+ irrevocable permission to anyone to use, modify, and distribute it in
12171217+ any way that does not diminish the rights of anyone else to use,
12181218+ modify, and distribute it, provided that redistributed derivative
12191219+ works do not contain misleading author or version information.
12201220+ Derivative works need not be licensed under similar terms.
12211221+12221222+12231223+12241224+12251225+12261226+12271227+12281228+12291229+12301230+12311231+12321232+12331233+12341234+Costello Standards Track [Page 22]
12351235+12361236+RFC 3492 IDNA Punycode March 2003
12371237+12381238+12391239+C. Punycode sample implementation
12401240+12411241+/*
12421242+punycode.c from RFC 3492
12431243+http://www.nicemice.net/idn/
12441244+Adam M. Costello
12451245+http://www.nicemice.net/amc/
12461246+12471247+This is ANSI C code (C89) implementing Punycode (RFC 3492).
12481248+12491249+*/
12501250+12511251+12521252+/************************************************************/
12531253+/* Public interface (would normally go in its own .h file): */
12541254+12551255+#include <limits.h>
12561256+12571257+enum punycode_status {
12581258+ punycode_success,
12591259+ punycode_bad_input, /* Input is invalid. */
12601260+ punycode_big_output, /* Output would exceed the space provided. */
12611261+ punycode_overflow /* Input needs wider integers to process. */
12621262+};
12631263+12641264+#if UINT_MAX >= (1 << 26) - 1
12651265+typedef unsigned int punycode_uint;
12661266+#else
12671267+typedef unsigned long punycode_uint;
12681268+#endif
12691269+12701270+enum punycode_status punycode_encode(
12711271+ punycode_uint input_length,
12721272+ const punycode_uint input[],
12731273+ const unsigned char case_flags[],
12741274+ punycode_uint *output_length,
12751275+ char output[] );
12761276+12771277+ /* punycode_encode() converts Unicode to Punycode. The input */
12781278+ /* is represented as an array of Unicode code points (not code */
12791279+ /* units; surrogate pairs are not allowed), and the output */
12801280+ /* will be represented as an array of ASCII code points. The */
12811281+ /* output string is *not* null-terminated; it will contain */
12821282+ /* zeros if and only if the input contains zeros. (Of course */
12831283+ /* the caller can leave room for a terminator and add one if */
12841284+ /* needed.) The input_length is the number of code points in */
12851285+ /* the input. The output_length is an in/out argument: the */
12861286+ /* caller passes in the maximum number of code points that it */
12871287+12881288+12891289+12901290+Costello Standards Track [Page 23]
12911291+12921292+RFC 3492 IDNA Punycode March 2003
12931293+12941294+12951295+ /* can receive, and on successful return it will contain the */
12961296+ /* number of code points actually output. The case_flags array */
12971297+ /* holds input_length boolean values, where nonzero suggests that */
12981298+ /* the corresponding Unicode character be forced to uppercase */
12991299+ /* after being decoded (if possible), and zero suggests that */
13001300+ /* it be forced to lowercase (if possible). ASCII code points */
13011301+ /* are encoded literally, except that ASCII letters are forced */
13021302+ /* to uppercase or lowercase according to the corresponding */
13031303+ /* uppercase flags. If case_flags is a null pointer then ASCII */
13041304+ /* letters are left as they are, and other code points are */
13051305+ /* treated as if their uppercase flags were zero. The return */
13061306+ /* value can be any of the punycode_status values defined above */
13071307+ /* except punycode_bad_input; if not punycode_success, then */
13081308+ /* output_size and output might contain garbage. */
13091309+13101310+enum punycode_status punycode_decode(
13111311+ punycode_uint input_length,
13121312+ const char input[],
13131313+ punycode_uint *output_length,
13141314+ punycode_uint output[],
13151315+ unsigned char case_flags[] );
13161316+13171317+ /* punycode_decode() converts Punycode to Unicode. The input is */
13181318+ /* represented as an array of ASCII code points, and the output */
13191319+ /* will be represented as an array of Unicode code points. The */
13201320+ /* input_length is the number of code points in the input. The */
13211321+ /* output_length is an in/out argument: the caller passes in */
13221322+ /* the maximum number of code points that it can receive, and */
13231323+ /* on successful return it will contain the actual number of */
13241324+ /* code points output. The case_flags array needs room for at */
13251325+ /* least output_length values, or it can be a null pointer if the */
13261326+ /* case information is not needed. A nonzero flag suggests that */
13271327+ /* the corresponding Unicode character be forced to uppercase */
13281328+ /* by the caller (if possible), while zero suggests that it be */
13291329+ /* forced to lowercase (if possible). ASCII code points are */
13301330+ /* output already in the proper case, but their flags will be set */
13311331+ /* appropriately so that applying the flags would be harmless. */
13321332+ /* The return value can be any of the punycode_status values */
13331333+ /* defined above; if not punycode_success, then output_length, */
13341334+ /* output, and case_flags might contain garbage. On success, the */
13351335+ /* decoder will never need to write an output_length greater than */
13361336+ /* input_length, because of how the encoding is defined. */
13371337+13381338+/**********************************************************/
13391339+/* Implementation (would normally go in its own .c file): */
13401340+13411341+#include <string.h>
13421342+13431343+13441344+13451345+13461346+Costello Standards Track [Page 24]
13471347+13481348+RFC 3492 IDNA Punycode March 2003
13491349+13501350+13511351+/*** Bootstring parameters for Punycode ***/
13521352+13531353+enum { base = 36, tmin = 1, tmax = 26, skew = 38, damp = 700,
13541354+ initial_bias = 72, initial_n = 0x80, delimiter = 0x2D };
13551355+13561356+/* basic(cp) tests whether cp is a basic code point: */
13571357+#define basic(cp) ((punycode_uint)(cp) < 0x80)
13581358+13591359+/* delim(cp) tests whether cp is a delimiter: */
13601360+#define delim(cp) ((cp) == delimiter)
13611361+13621362+/* decode_digit(cp) returns the numeric value of a basic code */
13631363+/* point (for use in representing integers) in the range 0 to */
13641364+/* base-1, or base if cp is does not represent a value. */
13651365+13661366+static punycode_uint decode_digit(punycode_uint cp)
13671367+{
13681368+ return cp - 48 < 10 ? cp - 22 : cp - 65 < 26 ? cp - 65 :
13691369+ cp - 97 < 26 ? cp - 97 : base;
13701370+}
13711371+13721372+/* encode_digit(d,flag) returns the basic code point whose value */
13731373+/* (when used for representing integers) is d, which needs to be in */
13741374+/* the range 0 to base-1. The lowercase form is used unless flag is */
13751375+/* nonzero, in which case the uppercase form is used. The behavior */
13761376+/* is undefined if flag is nonzero and digit d has no uppercase form. */
13771377+13781378+static char encode_digit(punycode_uint d, int flag)
13791379+{
13801380+ return d + 22 + 75 * (d < 26) - ((flag != 0) << 5);
13811381+ /* 0..25 map to ASCII a..z or A..Z */
13821382+ /* 26..35 map to ASCII 0..9 */
13831383+}
13841384+13851385+/* flagged(bcp) tests whether a basic code point is flagged */
13861386+/* (uppercase). The behavior is undefined if bcp is not a */
13871387+/* basic code point. */
13881388+13891389+#define flagged(bcp) ((punycode_uint)(bcp) - 65 < 26)
13901390+13911391+/* encode_basic(bcp,flag) forces a basic code point to lowercase */
13921392+/* if flag is zero, uppercase if flag is nonzero, and returns */
13931393+/* the resulting code point. The code point is unchanged if it */
13941394+/* is caseless. The behavior is undefined if bcp is not a basic */
13951395+/* code point. */
13961396+13971397+static char encode_basic(punycode_uint bcp, int flag)
13981398+{
13991399+14001400+14011401+14021402+Costello Standards Track [Page 25]
14031403+14041404+RFC 3492 IDNA Punycode March 2003
14051405+14061406+14071407+ bcp -= (bcp - 97 < 26) << 5;
14081408+ return bcp + ((!flag && (bcp - 65 < 26)) << 5);
14091409+}
14101410+14111411+/*** Platform-specific constants ***/
14121412+14131413+/* maxint is the maximum value of a punycode_uint variable: */
14141414+static const punycode_uint maxint = -1;
14151415+/* Because maxint is unsigned, -1 becomes the maximum value. */
14161416+14171417+/*** Bias adaptation function ***/
14181418+14191419+static punycode_uint adapt(
14201420+ punycode_uint delta, punycode_uint numpoints, int firsttime )
14211421+{
14221422+ punycode_uint k;
14231423+14241424+ delta = firsttime ? delta / damp : delta >> 1;
14251425+ /* delta >> 1 is a faster way of doing delta / 2 */
14261426+ delta += delta / numpoints;
14271427+14281428+ for (k = 0; delta > ((base - tmin) * tmax) / 2; k += base) {
14291429+ delta /= base - tmin;
14301430+ }
14311431+14321432+ return k + (base - tmin + 1) * delta / (delta + skew);
14331433+}
14341434+14351435+/*** Main encode function ***/
14361436+14371437+enum punycode_status punycode_encode(
14381438+ punycode_uint input_length,
14391439+ const punycode_uint input[],
14401440+ const unsigned char case_flags[],
14411441+ punycode_uint *output_length,
14421442+ char output[] )
14431443+{
14441444+ punycode_uint n, delta, h, b, out, max_out, bias, j, m, q, k, t;
14451445+14461446+ /* Initialize the state: */
14471447+14481448+ n = initial_n;
14491449+ delta = out = 0;
14501450+ max_out = *output_length;
14511451+ bias = initial_bias;
14521452+14531453+ /* Handle the basic code points: */
14541454+14551455+14561456+14571457+14581458+Costello Standards Track [Page 26]
14591459+14601460+RFC 3492 IDNA Punycode March 2003
14611461+14621462+14631463+ for (j = 0; j < input_length; ++j) {
14641464+ if (basic(input[j])) {
14651465+ if (max_out - out < 2) return punycode_big_output;
14661466+ output[out++] =
14671467+ case_flags ? encode_basic(input[j], case_flags[j]) : input[j];
14681468+ }
14691469+ /* else if (input[j] < n) return punycode_bad_input; */
14701470+ /* (not needed for Punycode with unsigned code points) */
14711471+ }
14721472+14731473+ h = b = out;
14741474+14751475+ /* h is the number of code points that have been handled, b is the */
14761476+ /* number of basic code points, and out is the number of characters */
14771477+ /* that have been output. */
14781478+14791479+ if (b > 0) output[out++] = delimiter;
14801480+14811481+ /* Main encoding loop: */
14821482+14831483+ while (h < input_length) {
14841484+ /* All non-basic code points < n have been */
14851485+ /* handled already. Find the next larger one: */
14861486+14871487+ for (m = maxint, j = 0; j < input_length; ++j) {
14881488+ /* if (basic(input[j])) continue; */
14891489+ /* (not needed for Punycode) */
14901490+ if (input[j] >= n && input[j] < m) m = input[j];
14911491+ }
14921492+14931493+ /* Increase delta enough to advance the decoder's */
14941494+ /* <n,i> state to <m,0>, but guard against overflow: */
14951495+14961496+ if (m - n > (maxint - delta) / (h + 1)) return punycode_overflow;
14971497+ delta += (m - n) * (h + 1);
14981498+ n = m;
14991499+15001500+ for (j = 0; j < input_length; ++j) {
15011501+ /* Punycode does not need to check whether input[j] is basic: */
15021502+ if (input[j] < n /* || basic(input[j]) */ ) {
15031503+ if (++delta == 0) return punycode_overflow;
15041504+ }
15051505+15061506+ if (input[j] == n) {
15071507+ /* Represent delta as a generalized variable-length integer: */
15081508+15091509+ for (q = delta, k = base; ; k += base) {
15101510+ if (out >= max_out) return punycode_big_output;
15111511+15121512+15131513+15141514+Costello Standards Track [Page 27]
15151515+15161516+RFC 3492 IDNA Punycode March 2003
15171517+15181518+15191519+ t = k <= bias /* + tmin */ ? tmin : /* +tmin not needed */
15201520+ k >= bias + tmax ? tmax : k - bias;
15211521+ if (q < t) break;
15221522+ output[out++] = encode_digit(t + (q - t) % (base - t), 0);
15231523+ q = (q - t) / (base - t);
15241524+ }
15251525+15261526+ output[out++] = encode_digit(q, case_flags && case_flags[j]);
15271527+ bias = adapt(delta, h + 1, h == b);
15281528+ delta = 0;
15291529+ ++h;
15301530+ }
15311531+ }
15321532+15331533+ ++delta, ++n;
15341534+ }
15351535+15361536+ *output_length = out;
15371537+ return punycode_success;
15381538+}
15391539+15401540+/*** Main decode function ***/
15411541+15421542+enum punycode_status punycode_decode(
15431543+ punycode_uint input_length,
15441544+ const char input[],
15451545+ punycode_uint *output_length,
15461546+ punycode_uint output[],
15471547+ unsigned char case_flags[] )
15481548+{
15491549+ punycode_uint n, out, i, max_out, bias,
15501550+ b, j, in, oldi, w, k, digit, t;
15511551+15521552+ /* Initialize the state: */
15531553+15541554+ n = initial_n;
15551555+ out = i = 0;
15561556+ max_out = *output_length;
15571557+ bias = initial_bias;
15581558+15591559+ /* Handle the basic code points: Let b be the number of input code */
15601560+ /* points before the last delimiter, or 0 if there is none, then */
15611561+ /* copy the first b code points to the output. */
15621562+15631563+ for (b = j = 0; j < input_length; ++j) if (delim(input[j])) b = j;
15641564+ if (b > max_out) return punycode_big_output;
15651565+15661566+ for (j = 0; j < b; ++j) {
15671567+15681568+15691569+15701570+Costello Standards Track [Page 28]
15711571+15721572+RFC 3492 IDNA Punycode March 2003
15731573+15741574+15751575+ if (case_flags) case_flags[out] = flagged(input[j]);
15761576+ if (!basic(input[j])) return punycode_bad_input;
15771577+ output[out++] = input[j];
15781578+ }
15791579+15801580+ /* Main decoding loop: Start just after the last delimiter if any */
15811581+ /* basic code points were copied; start at the beginning otherwise. */
15821582+15831583+ for (in = b > 0 ? b + 1 : 0; in < input_length; ++out) {
15841584+15851585+ /* in is the index of the next character to be consumed, and */
15861586+ /* out is the number of code points in the output array. */
15871587+15881588+ /* Decode a generalized variable-length integer into delta, */
15891589+ /* which gets added to i. The overflow checking is easier */
15901590+ /* if we increase i as we go, then subtract off its starting */
15911591+ /* value at the end to obtain delta. */
15921592+15931593+ for (oldi = i, w = 1, k = base; ; k += base) {
15941594+ if (in >= input_length) return punycode_bad_input;
15951595+ digit = decode_digit(input[in++]);
15961596+ if (digit >= base) return punycode_bad_input;
15971597+ if (digit > (maxint - i) / w) return punycode_overflow;
15981598+ i += digit * w;
15991599+ t = k <= bias /* + tmin */ ? tmin : /* +tmin not needed */
16001600+ k >= bias + tmax ? tmax : k - bias;
16011601+ if (digit < t) break;
16021602+ if (w > maxint / (base - t)) return punycode_overflow;
16031603+ w *= (base - t);
16041604+ }
16051605+16061606+ bias = adapt(i - oldi, out + 1, oldi == 0);
16071607+16081608+ /* i was supposed to wrap around from out+1 to 0, */
16091609+ /* incrementing n each time, so we'll fix that now: */
16101610+16111611+ if (i / (out + 1) > maxint - n) return punycode_overflow;
16121612+ n += i / (out + 1);
16131613+ i %= (out + 1);
16141614+16151615+ /* Insert n at position i of the output: */
16161616+16171617+ /* not needed for Punycode: */
16181618+ /* if (decode_digit(n) <= base) return punycode_invalid_input; */
16191619+ if (out >= max_out) return punycode_big_output;
16201620+16211621+ if (case_flags) {
16221622+ memmove(case_flags + i + 1, case_flags + i, out - i);
16231623+16241624+16251625+16261626+Costello Standards Track [Page 29]
16271627+16281628+RFC 3492 IDNA Punycode March 2003
16291629+16301630+16311631+ /* Case of last character determines uppercase flag: */
16321632+ case_flags[i] = flagged(input[in - 1]);
16331633+ }
16341634+16351635+ memmove(output + i + 1, output + i, (out - i) * sizeof *output);
16361636+ output[i++] = n;
16371637+ }
16381638+16391639+ *output_length = out;
16401640+ return punycode_success;
16411641+}
16421642+16431643+/******************************************************************/
16441644+/* Wrapper for testing (would normally go in a separate .c file): */
16451645+16461646+#include <assert.h>
16471647+#include <stdio.h>
16481648+#include <stdlib.h>
16491649+#include <string.h>
16501650+16511651+/* For testing, we'll just set some compile-time limits rather than */
16521652+/* use malloc(), and set a compile-time option rather than using a */
16531653+/* command-line option. */
16541654+16551655+enum {
16561656+ unicode_max_length = 256,
16571657+ ace_max_length = 256
16581658+};
16591659+16601660+static void usage(char **argv)
16611661+{
16621662+ fprintf(stderr,
16631663+ "\n"
16641664+ "%s -e reads code points and writes a Punycode string.\n"
16651665+ "%s -d reads a Punycode string and writes code points.\n"
16661666+ "\n"
16671667+ "Input and output are plain text in the native character set.\n"
16681668+ "Code points are in the form u+hex separated by whitespace.\n"
16691669+ "Although the specification allows Punycode strings to contain\n"
16701670+ "any characters from the ASCII repertoire, this test code\n"
16711671+ "supports only the printable characters, and needs the Punycode\n"
16721672+ "string to be followed by a newline.\n"
16731673+ "The case of the u in u+hex is the force-to-uppercase flag.\n"
16741674+ , argv[0], argv[0]);
16751675+ exit(EXIT_FAILURE);
16761676+}
16771677+16781678+static void fail(const char *msg)
16791679+16801680+16811681+16821682+Costello Standards Track [Page 30]
16831683+16841684+RFC 3492 IDNA Punycode March 2003
16851685+16861686+16871687+{
16881688+ fputs(msg,stderr);
16891689+ exit(EXIT_FAILURE);
16901690+}
16911691+16921692+static const char too_big[] =
16931693+ "input or output is too large, recompile with larger limits\n";
16941694+static const char invalid_input[] = "invalid input\n";
16951695+static const char overflow[] = "arithmetic overflow\n";
16961696+static const char io_error[] = "I/O error\n";
16971697+16981698+/* The following string is used to convert printable */
16991699+/* characters between ASCII and the native charset: */
17001700+17011701+static const char print_ascii[] =
17021702+ "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n"
17031703+ "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n"
17041704+ " !\"#$%&'()*+,-./"
17051705+ "0123456789:;<=>?"
17061706+ "@ABCDEFGHIJKLMNO"
17071707+ "PQRSTUVWXYZ[\\]^_"
17081708+ "`abcdefghijklmno"
17091709+ "pqrstuvwxyz{|}~\n";
17101710+17111711+int main(int argc, char **argv)
17121712+{
17131713+ enum punycode_status status;
17141714+ int r;
17151715+ unsigned int input_length, output_length, j;
17161716+ unsigned char case_flags[unicode_max_length];
17171717+17181718+ if (argc != 2) usage(argv);
17191719+ if (argv[1][0] != '-') usage(argv);
17201720+ if (argv[1][2] != 0) usage(argv);
17211721+17221722+ if (argv[1][1] == 'e') {
17231723+ punycode_uint input[unicode_max_length];
17241724+ unsigned long codept;
17251725+ char output[ace_max_length+1], uplus[3];
17261726+ int c;
17271727+17281728+ /* Read the input code points: */
17291729+17301730+ input_length = 0;
17311731+17321732+ for (;;) {
17331733+ r = scanf("%2s%lx", uplus, &codept);
17341734+ if (ferror(stdin)) fail(io_error);
17351735+17361736+17371737+17381738+Costello Standards Track [Page 31]
17391739+17401740+RFC 3492 IDNA Punycode March 2003
17411741+17421742+17431743+ if (r == EOF || r == 0) break;
17441744+17451745+ if (r != 2 || uplus[1] != '+' || codept > (punycode_uint)-1) {
17461746+ fail(invalid_input);
17471747+ }
17481748+17491749+ if (input_length == unicode_max_length) fail(too_big);
17501750+17511751+ if (uplus[0] == 'u') case_flags[input_length] = 0;
17521752+ else if (uplus[0] == 'U') case_flags[input_length] = 1;
17531753+ else fail(invalid_input);
17541754+17551755+ input[input_length++] = codept;
17561756+ }
17571757+17581758+ /* Encode: */
17591759+17601760+ output_length = ace_max_length;
17611761+ status = punycode_encode(input_length, input, case_flags,
17621762+ &output_length, output);
17631763+ if (status == punycode_bad_input) fail(invalid_input);
17641764+ if (status == punycode_big_output) fail(too_big);
17651765+ if (status == punycode_overflow) fail(overflow);
17661766+ assert(status == punycode_success);
17671767+17681768+ /* Convert to native charset and output: */
17691769+17701770+ for (j = 0; j < output_length; ++j) {
17711771+ c = output[j];
17721772+ assert(c >= 0 && c <= 127);
17731773+ if (print_ascii[c] == 0) fail(invalid_input);
17741774+ output[j] = print_ascii[c];
17751775+ }
17761776+17771777+ output[j] = 0;
17781778+ r = puts(output);
17791779+ if (r == EOF) fail(io_error);
17801780+ return EXIT_SUCCESS;
17811781+ }
17821782+17831783+ if (argv[1][1] == 'd') {
17841784+ char input[ace_max_length+2], *p, *pp;
17851785+ punycode_uint output[unicode_max_length];
17861786+17871787+ /* Read the Punycode input string and convert to ASCII: */
17881788+17891789+ fgets(input, ace_max_length+2, stdin);
17901790+ if (ferror(stdin)) fail(io_error);
17911791+17921792+17931793+17941794+Costello Standards Track [Page 32]
17951795+17961796+RFC 3492 IDNA Punycode March 2003
17971797+17981798+17991799+ if (feof(stdin)) fail(invalid_input);
18001800+ input_length = strlen(input) - 1;
18011801+ if (input[input_length] != '\n') fail(too_big);
18021802+ input[input_length] = 0;
18031803+18041804+ for (p = input; *p != 0; ++p) {
18051805+ pp = strchr(print_ascii, *p);
18061806+ if (pp == 0) fail(invalid_input);
18071807+ *p = pp - print_ascii;
18081808+ }
18091809+18101810+ /* Decode: */
18111811+18121812+ output_length = unicode_max_length;
18131813+ status = punycode_decode(input_length, input, &output_length,
18141814+ output, case_flags);
18151815+ if (status == punycode_bad_input) fail(invalid_input);
18161816+ if (status == punycode_big_output) fail(too_big);
18171817+ if (status == punycode_overflow) fail(overflow);
18181818+ assert(status == punycode_success);
18191819+18201820+ /* Output the result: */
18211821+18221822+ for (j = 0; j < output_length; ++j) {
18231823+ r = printf("%s+%04lX\n",
18241824+ case_flags[j] ? "U" : "u",
18251825+ (unsigned long) output[j] );
18261826+ if (r < 0) fail(io_error);
18271827+ }
18281828+18291829+ return EXIT_SUCCESS;
18301830+ }
18311831+18321832+ usage(argv);
18331833+ return EXIT_SUCCESS; /* not reached, but quiets compiler warning */
18341834+}
18351835+18361836+18371837+18381838+18391839+18401840+18411841+18421842+18431843+18441844+18451845+18461846+18471847+18481848+18491849+18501850+Costello Standards Track [Page 33]
18511851+18521852+RFC 3492 IDNA Punycode March 2003
18531853+18541854+18551855+Author's Address
18561856+18571857+ Adam M. Costello
18581858+ University of California, Berkeley
18591859+ http://www.nicemice.net/amc/
18601860+18611861+18621862+18631863+18641864+18651865+18661866+18671867+18681868+18691869+18701870+18711871+18721872+18731873+18741874+18751875+18761876+18771877+18781878+18791879+18801880+18811881+18821882+18831883+18841884+18851885+18861886+18871887+18881888+18891889+18901890+18911891+18921892+18931893+18941894+18951895+18961896+18971897+18981898+18991899+19001900+19011901+19021902+19031903+19041904+19051905+19061906+Costello Standards Track [Page 34]
19071907+19081908+RFC 3492 IDNA Punycode March 2003
19091909+19101910+19111911+Full Copyright Statement
19121912+19131913+ Copyright (C) The Internet Society (2003). All Rights Reserved.
19141914+19151915+ This document and translations of it may be copied and furnished to
19161916+ others, and derivative works that comment on or otherwise explain it
19171917+ or assist in its implementation may be prepared, copied, published
19181918+ and distributed, in whole or in part, without restriction of any
19191919+ kind, provided that the above copyright notice and this paragraph are
19201920+ included on all such copies and derivative works. However, this
19211921+ document itself may not be modified in any way, such as by removing
19221922+ the copyright notice or references to the Internet Society or other
19231923+ Internet organizations, except as needed for the purpose of
19241924+ developing Internet standards in which case the procedures for
19251925+ copyrights defined in the Internet Standards process must be
19261926+ followed, or as required to translate it into languages other than
19271927+ English.
19281928+19291929+ The limited permissions granted above are perpetual and will not be
19301930+ revoked by the Internet Society or its successors or assigns.
19311931+19321932+ This document and the information contained herein is provided on an
19331933+ "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
19341934+ TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
19351935+ BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
19361936+ HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
19371937+ MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
19381938+19391939+Acknowledgement
19401940+19411941+ Funding for the RFC Editor function is currently provided by the
19421942+ Internet Society.
19431943+19441944+19451945+19461946+19471947+19481948+19491949+19501950+19511951+19521952+19531953+19541954+19551955+19561956+19571957+19581958+19591959+19601960+19611961+19621962+Costello Standards Track [Page 35]
19631963+
+1291
spec/rfc5890.txt
···11+22+33+44+55+66+77+Internet Engineering Task Force (IETF) J. Klensin
88+Request for Comments: 5890 August 2010
99+Obsoletes: 3490
1010+Category: Standards Track
1111+ISSN: 2070-1721
1212+1313+1414+ Internationalized Domain Names for Applications (IDNA):
1515+ Definitions and Document Framework
1616+1717+Abstract
1818+1919+ This document is one of a collection that, together, describe the
2020+ protocol and usage context for a revision of Internationalized Domain
2121+ Names for Applications (IDNA), superseding the earlier version. It
2222+ describes the document collection and provides definitions and other
2323+ material that are common to the set.
2424+2525+Status of This Memo
2626+2727+ This is an Internet Standards Track document.
2828+2929+ This document is a product of the Internet Engineering Task Force
3030+ (IETF). It represents the consensus of the IETF community. It has
3131+ received public review and has been approved for publication by the
3232+ Internet Engineering Steering Group (IESG). Further information on
3333+ Internet Standards is available in Section 2 of RFC 5741.
3434+3535+ Information about the current status of this document, any errata,
3636+ and how to provide feedback on it may be obtained at
3737+ http://www.rfc-editor.org/info/rfc5890.
3838+3939+4040+4141+4242+4343+4444+4545+4646+4747+4848+4949+5050+5151+5252+5353+5454+5555+5656+5757+5858+Klensin Standards Track [Page 1]
5959+6060+RFC 5890 IDNA Definitions August 2010
6161+6262+6363+Copyright Notice
6464+6565+ Copyright (c) 2010 IETF Trust and the persons identified as the
6666+ document authors. All rights reserved.
6767+6868+ This document is subject to BCP 78 and the IETF Trust's Legal
6969+ Provisions Relating to IETF Documents
7070+ (http://trustee.ietf.org/license-info) in effect on the date of
7171+ publication of this document. Please review these documents
7272+ carefully, as they describe your rights and restrictions with respect
7373+ to this document. Code Components extracted from this document must
7474+ include Simplified BSD License text as described in Section 4.e of
7575+ the Trust Legal Provisions and are provided without warranty as
7676+ described in the Simplified BSD License.
7777+7878+ This document may contain material from IETF Documents or IETF
7979+ Contributions published or made publicly available before November
8080+ 10, 2008. The person(s) controlling the copyright in some of this
8181+ material may not have granted the IETF Trust the right to allow
8282+ modifications of such material outside the IETF Standards Process.
8383+ Without obtaining an adequate license from the person(s) controlling
8484+ the copyright in such materials, this document may not be modified
8585+ outside the IETF Standards Process, and derivative works of it may
8686+ not be created outside the IETF Standards Process, except to format
8787+ it for publication as an RFC or to translate it into languages other
8888+ than English.
8989+9090+9191+9292+9393+9494+9595+9696+9797+9898+9999+100100+101101+102102+103103+104104+105105+106106+107107+108108+109109+110110+111111+112112+113113+114114+Klensin Standards Track [Page 2]
115115+116116+RFC 5890 IDNA Definitions August 2010
117117+118118+119119+Table of Contents
120120+121121+ 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
122122+ 1.1. IDNA2008 . . . . . . . . . . . . . . . . . . . . . . . . . 4
123123+ 1.1.1. Audiences . . . . . . . . . . . . . . . . . . . . . . 4
124124+ 1.1.2. Normative Language . . . . . . . . . . . . . . . . . . 5
125125+ 1.2. Road Map of IDNA2008 Documents . . . . . . . . . . . . . . 5
126126+ 2. Definitions and Terminology . . . . . . . . . . . . . . . . . 6
127127+ 2.1. Characters and Character Sets . . . . . . . . . . . . . . 6
128128+ 2.2. DNS-Related Terminology . . . . . . . . . . . . . . . . . 6
129129+ 2.3. Terminology Specific to IDNA . . . . . . . . . . . . . . . 7
130130+ 2.3.1. LDH Label . . . . . . . . . . . . . . . . . . . . . . 7
131131+ 2.3.2. Terms for IDN Label Codings . . . . . . . . . . . . . 11
132132+ 2.3.2.1. IDNA-valid strings, A-label, and U-label . . . . . 11
133133+ 2.3.2.2. NR-LDH Label . . . . . . . . . . . . . . . . . . . 13
134134+ 2.3.2.3. Internationalized Domain Name and
135135+ Internationalized Label . . . . . . . . . . . . . 13
136136+ 2.3.2.4. Label Equivalence . . . . . . . . . . . . . . . . 14
137137+ 2.3.2.5. ACE Prefix . . . . . . . . . . . . . . . . . . . . 14
138138+ 2.3.2.6. Domain Name Slot . . . . . . . . . . . . . . . . . 14
139139+ 2.3.3. Order of Characters in Labels . . . . . . . . . . . . 15
140140+ 2.3.4. Punycode is an Algorithm, Not a Name or Adjective . . 15
141141+ 3. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16
142142+ 4. Security Considerations . . . . . . . . . . . . . . . . . . . 16
143143+ 4.1. General Issues . . . . . . . . . . . . . . . . . . . . . . 16
144144+ 4.2. U-label Lengths . . . . . . . . . . . . . . . . . . . . . 16
145145+ 4.3. Local Character Set Issues . . . . . . . . . . . . . . . . 17
146146+ 4.4. Visually Similar Characters . . . . . . . . . . . . . . . 17
147147+ 4.5. IDNA Lookup, Registration, and the Base DNS
148148+ Specifications . . . . . . . . . . . . . . . . . . . . . . 18
149149+ 4.6. Legacy IDN Label Strings . . . . . . . . . . . . . . . . . 18
150150+ 4.7. Security Differences from IDNA2003 . . . . . . . . . . . . 19
151151+ 4.8. Summary . . . . . . . . . . . . . . . . . . . . . . . . . 20
152152+ 5. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 20
153153+ 6. References . . . . . . . . . . . . . . . . . . . . . . . . . . 20
154154+ 6.1. Normative References . . . . . . . . . . . . . . . . . . . 20
155155+ 6.2. Informative References . . . . . . . . . . . . . . . . . . 21
156156+157157+158158+159159+160160+161161+162162+163163+164164+165165+166166+167167+168168+169169+170170+Klensin Standards Track [Page 3]
171171+172172+RFC 5890 IDNA Definitions August 2010
173173+174174+175175+1. Introduction
176176+177177+1.1. IDNA2008
178178+179179+ This document is one of a collection that, together, describe the
180180+ protocol and usage context for a revision of Internationalized Domain
181181+ Names for Applications (IDNA) that was largely completed in 2008,
182182+ known within the series and elsewhere as "IDNA2008". The series
183183+ replaces an earlier version of IDNA [RFC3490] [RFC3491]. For
184184+ convenience, that version of IDNA is referred to in these documents
185185+ as "IDNA2003". The newer version continues to use the Punycode
186186+ algorithm [RFC3492] and ACE (ASCII-compatible encoding) prefix from
187187+ that earlier version. The document collection is described in
188188+ Section 1.2. As indicated there, this document provides definitions
189189+ and other material that are common to the set.
190190+191191+1.1.1. Audiences
192192+193193+ While many IETF specifications are directed exclusively to protocol
194194+ implementers, the character of IDNA requires that it be understood
195195+ and properly used by those whose responsibilities include making
196196+ decisions about:
197197+198198+ o what names are permitted in DNS zone files,
199199+200200+ o policies related to names and naming, and
201201+202202+ o the handling of domain name strings in files and systems, even
203203+ with no immediate intention of looking them up.
204204+205205+ This document and those documents concerned with the protocol
206206+ definition, rules for handling strings that include characters
207207+ written right to left, and the actual list of characters and
208208+ categories will be of primary interest to protocol implementers.
209209+ This document and the one containing explanatory material will be of
210210+ primary interest to others, although they may have to fill in some
211211+ details by reference to other documents in the set.
212212+213213+ This document and the associated ones are written from the
214214+ perspective of an IDNA-aware user, application, or implementation.
215215+ While they may reiterate fundamental DNS rules and requirements for
216216+ the convenience of the reader, they make no attempt to be
217217+ comprehensive about DNS principles and should not be considered as a
218218+ substitute for a thorough understanding of the DNS protocols and
219219+ specifications.
220220+221221+222222+223223+224224+225225+226226+Klensin Standards Track [Page 4]
227227+228228+RFC 5890 IDNA Definitions August 2010
229229+230230+231231+1.1.2. Normative Language
232232+233233+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
234234+ "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
235235+ document are to be interpreted as described in RFC 2119 [RFC2119].
236236+237237+1.2. Road Map of IDNA2008 Documents
238238+239239+ IDNA2008 consists of the following documents:
240240+241241+ o This document, containing definitions and other material that are
242242+ needed for understanding other documents in the set. It is
243243+ referred to informally in other documents in the set as "Defs" or
244244+ "Definitions".
245245+246246+ o A document, RFC 5894 [RFC5894], that provides an overview of the
247247+ protocol and associated tables together with explanatory material
248248+ and some rationale for the decisions that led to IDNA2008. That
249249+ document also contains advice for registry operations and those
250250+ who use Internationalized Domain Names (IDNs). It is referred to
251251+ informally in other documents in the set as "Rationale". It is
252252+ not normative.
253253+254254+ o A document, RFC 5891 [RFC5891], that describes the core IDNA2008
255255+ protocol and its operations. In combination with the Bidi
256256+ document, described immediately below, it explicitly updates and
257257+ replaces RFC 3490. It is referred to informally in other
258258+ documents in the set as "Protocol".
259259+260260+ o A document, RFC 5893 [RFC5893], that specifies special rules
261261+ (Bidi) for labels that contain characters that are written from
262262+ right to left.
263263+264264+ o A specification, RFC 5892 [RFC5892], of the categories and rules
265265+ that identify the code points allowed in a label written in native
266266+ character form (defined more specifically as a "U-label" in
267267+ Section 2.3.2.1 below), based on Unicode 5.2 [Unicode52] code
268268+ point assignments and additional rules unique to IDNA2008. The
269269+ Unicode-based rules are expected to be stable across Unicode
270270+ updates and hence independent of Unicode versions. That
271271+ specification obsoletes RFC 3941 and IDN use of the tables to
272272+ which it refers. It is referred to informally in other documents
273273+ in the set as "Tables".
274274+275275+276276+277277+278278+279279+280280+281281+282282+Klensin Standards Track [Page 5]
283283+284284+RFC 5890 IDNA Definitions August 2010
285285+286286+287287+ o A document [IDNA2008-Mapping] that discusses the issue of mapping
288288+ characters into other characters and that provides guidance for
289289+ doing so when that is appropriate. That document, referred to
290290+ informally as "Mapping", provides advice; it is not a required
291291+ part of IDNA.
292292+293293+2. Definitions and Terminology
294294+295295+2.1. Characters and Character Sets
296296+297297+ A code point is an integer value in the codespace of a coded
298298+ character set. In Unicode, these are integers from 0 to 0x10FFFF.
299299+300300+ Unicode [Unicode52] is a coded character set containing somewhat over
301301+ 100,000 characters assigned to code points as of version 5.2. A
302302+ single Unicode code point is denoted in these documents by "U+"
303303+ followed by four to six hexadecimal digits, while a range of Unicode
304304+ code points is denoted by two four to six digit hexadecimal numbers
305305+ separated by "..", with no prefixes.
306306+307307+ ASCII means US-ASCII [ASCII], a coded character set containing 128
308308+ characters associated with code points in the range 0000..007F.
309309+ Unicode is a superset of ASCII and may be thought of as a
310310+ generalization of it; it includes all the ASCII characters and
311311+ associates them with the equivalent code points.
312312+313313+ "Letters" are, informally, generalizations from the ASCII and
314314+ common-sense understanding of that term, i.e., characters that are
315315+ used to write text and that are not digits, symbols, or punctuation.
316316+ Formally, they are characters with a Unicode General Category value
317317+ starting in "L" (see Section 4.5 of The Unicode Standard
318318+ [Unicode52]).
319319+320320+2.2. DNS-Related Terminology
321321+322322+ When discussing the DNS, this document generally assumes the
323323+ terminology used in the DNS specifications [RFC1034] [RFC1035] as
324324+ subsequently modified [RFC1123] [RFC2181]. The term "lookup" is used
325325+ to describe the combination of operations performed by the IDNA2008
326326+ protocol and those actually performed by a DNS resolver. The process
327327+ of placing an entry into the DNS is referred to as "registration".
328328+ This is similar to common contemporary usage of that term in other
329329+ contexts. Consequently, any DNS zone administration is described as
330330+ a "registry", and the terms "registry" and "zone administrator" are
331331+ used interchangeably, regardless of the actual administrative
332332+ arrangements or level in the DNS tree. More details about that
333333+ relationship are included in the Rationale document.
334334+335335+336336+337337+338338+Klensin Standards Track [Page 6]
339339+340340+RFC 5890 IDNA Definitions August 2010
341341+342342+343343+ The term "LDH code point" is defined in this document to refer to the
344344+ code points associated with ASCII letters (Unicode code points
345345+ 0041..005A and 0061..007A), digits (0030..0039), and the hyphen-minus
346346+ (U+002D). "LDH" is an abbreviation for "letters, digits, hyphen" but
347347+ is used specifically in this document to refer to the set of naming
348348+ rules described in Section 2.3.1 below.
349349+350350+ The base DNS specifications [RFC1034] [RFC1035] discuss "domain
351351+ names" and "hostnames", but many people use the terms
352352+ interchangeably, as do sections of these specifications. Lack of
353353+ clarity about that terminology has contributed to confusion about
354354+ intent in some cases. These documents generally use the term "domain
355355+ name". When they refer to, e.g., hostname syntax restrictions, they
356356+ explicitly cite the relevant defining documents. The remaining
357357+ definitions in this subsection are essentially a review: if there is
358358+ any perceived difference between those definitions and the
359359+ definitions in the base DNS documents or those cited below, the
360360+ definitions in the other documents take precedence.
361361+362362+ A label is an individual component of a domain name. Labels are
363363+ usually shown separated by dots; for example, the domain name
364364+ "www.example.com" is composed of three labels: "www", "example", and
365365+ "com". (The complete name convention using a trailing dot described
366366+ in RFC 1123 [RFC1123], which can be explicit as in "www.example.com."
367367+ or implicit as in "www.example.com", is not considered in this
368368+ specification.) IDNA extends the set of usable characters in labels
369369+ that are treated as text (as distinct from the binary string labels
370370+ discussed in RFC 1035 and RFC 2181 [RFC2181] and bitstring ones
371371+ [RFC2673]), but only in certain contexts. The different contexts for
372372+ different sets of usable characters are outlined in the next section.
373373+ For the rest of this document and in the related ones, the term
374374+ "label" is shorthand for "text label", and "every label" means "every
375375+ text label", including the expanded context.
376376+377377+2.3. Terminology Specific to IDNA
378378+379379+ This section defines some terminology to reduce dependence on terms
380380+ and definitions that have been problematic in the past. The
381381+ relationships among these definitions are illustrated in Figure 1 and
382382+ Figure 2. In the first of those figures, the parenthesized numbers
383383+ refer to the notes below the figure.
384384+385385+2.3.1. LDH Label
386386+387387+ This is the classical label form used, albeit with some additional
388388+ restrictions, in hostnames [RFC0952]. Its syntax is identical to
389389+ that described as the "preferred name syntax" in Section 3.5 of RFC
390390+ 1034 [RFC1034] as modified by RFC 1123 [RFC1123]. Briefly, it is a
391391+392392+393393+394394+Klensin Standards Track [Page 7]
395395+396396+RFC 5890 IDNA Definitions August 2010
397397+398398+399399+ string consisting of ASCII letters, digits, and the hyphen with the
400400+ further restriction that the hyphen cannot appear at the beginning or
401401+ end of the string. Like all DNS labels, its total length must not
402402+ exceed 63 octets.
403403+404404+ LDH labels include the specialized labels used by IDNA (described as
405405+ "A-labels" below) and some additional restricted forms (also
406406+ described below).
407407+408408+ To facilitate clear description, two new subsets of LDH labels are
409409+ created by the introduction of IDNA. These are called Reserved LDH
410410+ labels (R-LDH labels) and Non-Reserved LDH labels (NR-LDH labels).
411411+ Reserved LDH labels, known as "tagged domain names" in some other
412412+ contexts, have the property that they contain "--" in the third and
413413+ fourth characters but which otherwise conform to LDH label rules.
414414+ Only a subset of the R-LDH labels can be used in IDNA-aware
415415+ applications. That subset consists of the class of labels that begin
416416+ with the prefix "xn--" (case independent), but otherwise conform to
417417+ the rules for LDH labels. That subset is called "XN-labels" in this
418418+ set of documents. XN-labels are further divided into those whose
419419+ remaining characters (after the "xn--") are valid output of the
420420+ Punycode algorithm [RFC3492] and those that are not (see below). The
421421+ XN-labels that are valid Punycode output are known as "A-labels" if
422422+ they also meet the other criteria for IDNA-validity described below.
423423+ Because LDH labels (and, indeed, any DNS label) must not be more than
424424+ 63 octets in length, the portion of an XN-label derived from the
425425+ Punycode algorithm is limited to no more than 59 ASCII characters.
426426+ Non-Reserved LDH labels are the set of valid LDH labels that do not
427427+ have "--" in the third and fourth positions.
428428+429429+ A consequence of the restrictions on valid characters in the native
430430+ Unicode character form (see U-labels) turns out to be that mixed-case
431431+ annotation, of the sort outlined in Appendix A of RFC 3492 [RFC3492],
432432+ is never useful. Therefore, since a valid A-label is the result of
433433+ Punycode encoding of a U-label, A-labels should be produced only in
434434+ lowercase, despite matching other (mixed-case or uppercase) potential
435435+ labels in the DNS.
436436+437437+ Some strings that are prefixed with "xn--" to form labels may not be
438438+ the output of the Punycode algorithm, may fail the other tests
439439+ outlined below, or may violate other IDNA restrictions and thus are
440440+ also not valid IDNA labels. They are called "Fake A-labels" for
441441+ convenience.
442442+443443+ Labels within the class of R-LDH labels that are not prefixed with
444444+ "xn--" are also not valid IDNA labels. To allow for future use of
445445+ mechanisms similar to IDNA, those labels MUST NOT be processed as
446446+447447+448448+449449+450450+Klensin Standards Track [Page 8]
451451+452452+RFC 5890 IDNA Definitions August 2010
453453+454454+455455+ ordinary LDH labels by IDNA-conforming programs and SHOULD NOT be
456456+ mixed with IDNA labels in the same zone.
457457+458458+ These distinctions among possible LDH labels are only of significance
459459+ for software that is IDNA-aware or for future extensions that use
460460+ extensions based on the same "prefix and encoding" model. For
461461+ IDNA-aware systems, the valid label types are: A-labels, U-labels,
462462+ and NR-LDH labels.
463463+464464+ IDNA labels come in two flavors: an ACE-encoded form and a Unicode
465465+ (native character) form. These are referred to as A-labels and
466466+ U-labels, respectively, and are described in detail in the next
467467+ section.
468468+469469+470470+471471+472472+473473+474474+475475+476476+477477+478478+479479+480480+481481+482482+483483+484484+485485+486486+487487+488488+489489+490490+491491+492492+493493+494494+495495+496496+497497+498498+499499+500500+501501+502502+503503+504504+505505+506506+Klensin Standards Track [Page 9]
507507+508508+RFC 5890 IDNA Definitions August 2010
509509+510510+511511+ ASCII Label
512512+ __________________________________________________________________
513513+ | |
514514+ | ____________________ LDH Label (1) (4) ________________ |
515515+ | | ___________________________________ | |
516516+ | | |IDN Reserved LDH Labels | | |
517517+ | | | ("??--") or R-LDH Labels | _______________ | |
518518+ | | | _______________________________ | |NON-RESERVED | | |
519519+ | | | | XN-labels | | | LDH Labels | | |
520520+ | | | | _____________ ___________ | | | (NR-LDH | | |
521521+ | | | | | A-labels | | Fake (3) || | | labels) | | |
522522+ | | | | | "xn--"(2) | | A-labels || | |_____________| | |
523523+ | | | | |___________| |__________|| | | |
524524+ | | | |_____________________________| | | |
525525+ | | |_________________________________| | |
526526+ | |_______________________________________________________| |
527527+ | |
528528+ | _____________NON-LDH label________ |
529529+ | | ______________________ | |
530530+ | | | Underscore labels | | |
531531+ | | | e.g., _tcp | | |
532532+ | | |____________________| | |
533533+ | | | Labels with leading| | |
534534+ | | | or trailing | | |
535535+ | | | hyphens "-abcd" | | |
536536+ | | | or "xyz-" | | |
537537+ | | | or "-uvw-" | | |
538538+ | | |____________________| | |
539539+ | | | Labels with other | | |
540540+ | | | non-LDH ASCII chars| | |
541541+ | | | e.g., #$%_ | | |
542542+ | | |____________________| | |
543543+ | |________________________________| |
544544+ |________________________________________________________________|
545545+546546+ (1) ASCII letters (uppercase and lowercase), digits,
547547+ hyphen. Hyphen may not appear in first or last
548548+ position. No more than 63 octets.
549549+ (2) Note that the string following "xn--" must
550550+ be the valid output of the Punycode algorithm
551551+ and must be convertible into valid U-label form.
552552+ (3) Note that a Fake A-label has a prefix "xn--"
553553+ but the remainder of the label is NOT the valid
554554+ output of the Punycode algorithm.
555555+ (4) LDH label subtypes are indistinguishable to
556556+ applications that are not IDNA-aware.
557557+558558+ Figure 1: IDNA and Related DNS Terminology Space -- ASCII Labels
559559+560560+561561+562562+Klensin Standards Track [Page 10]
563563+564564+RFC 5890 IDNA Definitions August 2010
565565+566566+567567+ __________________________
568568+ | Non-ASCII |
569569+ | |
570570+ | ___________________ |
571571+ | | U-label (5) | |
572572+ | |_________________| |
573573+ | | | |
574574+ | | Binary Label | |
575575+ | | (including | |
576576+ | | high bit on) | |
577577+ | |_________________| |
578578+ | | | |
579579+ | | Bit String | |
580580+ | | Label | |
581581+ | |_________________| |
582582+ |________________________|
583583+584584+ (5) To applications that are not IDNA-aware, U-labels
585585+ are indistinguishable from Binary ones.
586586+587587+ Figure 2: Non-ASCII Labels
588588+589589+2.3.2. Terms for IDN Label Codings
590590+591591+2.3.2.1. IDNA-valid strings, A-label, and U-label
592592+593593+ For IDNA-aware applications, the three types of valid labels are
594594+ "A-labels", "U-labels", and "NR-LDH labels", each of which is defined
595595+ below. The relationships among them are illustrated in Figure 1 and
596596+ Figure 2.
597597+598598+ o A string is "IDNA-valid" if it meets all of the requirements of
599599+ these specifications for an IDNA label. IDNA-valid strings may
600600+ appear in either of the two forms defined immediately below, or
601601+ may be drawn from the NR-LDH label subset. IDNA-valid strings
602602+ must also conform to all basic DNS requirements for labels. These
603603+ documents make specific reference to the form appropriate to any
604604+ context in which the distinction is important.
605605+606606+ o An "A-label" is the ASCII-Compatible Encoding (ACE, see
607607+ Section 2.3.2.5) form of an IDNA-valid string. It must be a
608608+ complete label: IDNA is defined for labels, not for parts of them
609609+ and not for complete domain names. This means, by definition,
610610+ that every A-label will begin with the IDNA ACE prefix, "xn--"
611611+ (see Section 2.3.2.5), followed by a string that is a valid output
612612+ of the Punycode algorithm [RFC3492] and hence a maximum of 59
613613+ ASCII characters in length. The prefix and string together must
614614+ conform to all requirements for a label that can be stored in the
615615+616616+617617+618618+Klensin Standards Track [Page 11]
619619+620620+RFC 5890 IDNA Definitions August 2010
621621+622622+623623+ DNS including conformance to the rules for LDH labels
624624+ (Section 2.3.1). If and only if a string meeting the above
625625+ requirements can be decoded into a U-label is it an A-label.
626626+627627+ o A "U-label" is an IDNA-valid string of Unicode characters, in
628628+ Normalization Form C (NFC) and including at least one non-ASCII
629629+ character, expressed in a standard Unicode Encoding Form (such as
630630+ UTF-8). It is also subject to the constraints about permitted
631631+ characters that are specified in Section 4.2 of the Protocol
632632+ document and the rules in the Sections 2 and 3 of the Tables
633633+ document, the Bidi constraints in that document if it contains any
634634+ character from scripts that are written right to left, and the
635635+ symmetry constraint described immediately below. Conversions
636636+ between U-labels and A-labels are performed according to the
637637+ "Punycode" specification [RFC3492], adding or removing the ACE
638638+ prefix as needed.
639639+640640+ To be valid, U-labels and A-labels must obey an important symmetry
641641+ constraint. While that constraint may be tested in any of several
642642+ ways, an A-label A1 must be capable of being produced by conversion
643643+ from a U-label U1, and that U-label U1 must be capable of being
644644+ produced by conversion from A-label A1. Among other things, this
645645+ implies that both U-labels and A-labels must be strings in Unicode
646646+ NFC [Unicode-UAX15] normalized form. These strings MUST contain only
647647+ characters specified elsewhere in this document series, and only in
648648+ the contexts indicated as appropriate.
649649+650650+ Any rules or conventions that apply to DNS labels in general apply to
651651+ whichever of the U-label or A-label would be more restrictive. There
652652+ are two exceptions to this principle. First, the restriction to
653653+ ASCII characters does not apply to the U-label. Second, expansion of
654654+ the A-label form to a U-label may produce strings that are much
655655+ longer than the normal 63 octet DNS limit (potentially up to 252
656656+ characters) due to the compression efficiency of the Punycode
657657+ algorithm. Such extended-length U-labels are valid from the
658658+ standpoint of IDNA, but caution should be exercised as shorter limits
659659+ may be imposed by some applications.
660660+661661+ For context, applications that are not IDNA-aware treat all LDH
662662+ labels as valid for appearance in DNS zone files and queries and some
663663+ of them may permit additional types of labels (i.e., not impose the
664664+ LDH restriction). IDNA-aware applications permit only A-labels and
665665+ NR-LDH labels to appear in zone files and queries. U-labels can
666666+ appear, along with the other two, in presentation and user interface
667667+ forms, and in protocols that use IDNA forms but that do not involve
668668+ the DNS itself.
669669+670670+671671+672672+673673+674674+Klensin Standards Track [Page 12]
675675+676676+RFC 5890 IDNA Definitions August 2010
677677+678678+679679+ Specifically, for IDNA-aware applications and contexts, the three
680680+ allowed categories are A-label, U-label, and NR-LDH label. Of the
681681+ Reserved LDH labels (R-LDH labels) only A-labels are valid for IDNA
682682+ use.
683683+684684+ Strings that appear to be A-labels or U-labels are processed in
685685+ various operations of the Protocol document [RFC5891]. Those strings
686686+ are not yet demonstrably conformant with the conditions outlined
687687+ above because they are in the process of validation. Such strings
688688+ may be referred to as "unvalidated", "putative", or "apparent", or as
689689+ being "in the form of" one of the label types to indicate that they
690690+ have not been verified to meet the specified conformance
691691+ requirements.
692692+693693+ Unvalidated A-labels are known only to be XN-labels, while Fake
694694+ A-labels have been demonstrated to fail some of the A-label tests.
695695+ Similarly, unvalidated U-labels are simply non-ASCII labels that may
696696+ or may not meet the requirements for U-labels.
697697+698698+2.3.2.2. NR-LDH Label
699699+700700+ These specifications use the term "NR-LDH label" strictly to refer to
701701+ an all-ASCII label that obeys the LDH label syntax discussed in
702702+ Section 2.3.1 and that is neither an IDN nor a label form reserved by
703703+ IDNA (R-LDH label). It should be stressed that all A-labels obey the
704704+ "hostname" [RFC0952] rules other than the length restriction in those
705705+ rules.
706706+707707+2.3.2.3. Internationalized Domain Name and Internationalized Label
708708+709709+ An "internationalized domain name" (IDN) is a domain name that
710710+ contains at least one A-label or U-label, but that otherwise may
711711+ contain any mixture of NR-LDH labels, A-labels, or U-labels. Just as
712712+ has been the case with ASCII names, some DNS zone administrators may
713713+ impose restrictions, beyond those imposed by DNS or IDNA, on the
714714+ characters or strings that may be registered as labels in their
715715+ zones. Because of the diversity of characters that can be used in a
716716+ U-label and the confusion they might cause, such restrictions are
717717+ mandatory for IDN registries and zones even though the particular
718718+ restrictions are not part of these specifications (the issue is
719719+ discussed in more detail in Section 4.3 of the Protocol document
720720+ [RFC5891]. Because these restrictions, commonly known as "registry
721721+ restrictions", only affect what can be registered and not lookup
722722+ processing, they have no effect on the syntax or semantics of DNS
723723+ protocol messages; a query for a name that matches no records will
724724+ yield the same response regardless of the reason why it is not in the
725725+ zone. Clients issuing queries or interpreting responses cannot be
726726+727727+728728+729729+730730+Klensin Standards Track [Page 13]
731731+732732+RFC 5890 IDNA Definitions August 2010
733733+734734+735735+ assumed to have any knowledge of zone-specific restrictions or
736736+ conventions. See the section on registration policy in the Rationale
737737+ document [RFC5894] for additional discussion.
738738+739739+ "Internationalized label" is used when a term is needed to refer to a
740740+ single label of an IDN, i.e., one that might be any of an NR-LDH
741741+ label, A-label, or U-label. There are some standardized DNS label
742742+ formats, such as the "underscore labels" used for service location
743743+ (SRV) records [RFC2782], that do not fall into any of the three
744744+ categories and hence are not internationalized labels.
745745+746746+2.3.2.4. Label Equivalence
747747+748748+ In IDNA, equivalence of labels is defined in terms of the A-labels.
749749+ If the A-labels are equal in a case-independent comparison, then the
750750+ labels are considered equivalent, no matter how they are represented.
751751+ Because of the isomorphism of A-labels and U-labels in IDNA2008, it
752752+ is possible to compare U-labels directly; see the Protocol document
753753+ [RFC5891] for details. Traditional LDH labels already have a notion
754754+ of equivalence: within that list of characters, uppercase and
755755+ lowercase are considered equivalent. The IDNA notion of equivalence
756756+ is an extension of that older notion but, because the protocol does
757757+ not specify any mandatory mapping and only those isomorphic forms are
758758+ considered, the only equivalents are:
759759+760760+ o Exact (bit-string identity) matches between a pair of U-labels.
761761+762762+ o Matches between a pair of A-labels, using normal DNS
763763+ case-insensitive matching rules.
764764+765765+ o Equivalence between a U-label and an A-label determined by
766766+ translating the U-label form into an A-label form and then testing
767767+ for a match between the A-labels using normal DNS case-insensitive
768768+ matching rules.
769769+770770+2.3.2.5. ACE Prefix
771771+772772+ The "ACE prefix" is defined in this document to be a string of ASCII
773773+ characters, "xn--", that appears at the beginning of every A-label.
774774+ "ACE" stands for "ASCII-Compatible Encoding".
775775+776776+2.3.2.6. Domain Name Slot
777777+778778+ A "domain name slot" is defined in this document to be a protocol
779779+ element or a function argument or a return value (and so on)
780780+ explicitly designated for carrying a domain name. Examples of domain
781781+ name slots include the QNAME field of a DNS query; the name argument
782782+ of the gethostbyname() or getaddrinfo() standard C library functions;
783783+784784+785785+786786+Klensin Standards Track [Page 14]
787787+788788+RFC 5890 IDNA Definitions August 2010
789789+790790+791791+ the part of an email address following the at sign ("@") in the
792792+ parameter to the SMTP MAIL or RCPT commands or the "From:" field of
793793+ an email message header; and the host portion of the URI in the "src"
794794+ attribute of an HTML "<IMG>" tag. A string that has the syntax of a
795795+ domain name but that appears in general text is not in a domain name
796796+ slot. For example, a domain name appearing in the plain text body of
797797+ an email message is not occupying a domain name slot.
798798+799799+ An "IDNA-aware domain name slot" is defined for this set of documents
800800+ to be a domain name slot explicitly designated for carrying an
801801+ internationalized domain name as defined in this document. The
802802+ designation may be static (for example, in the specification of the
803803+ protocol or interface) or dynamic (for example, as a result of
804804+ negotiation in an interactive session).
805805+806806+ Name slots that are not IDNA-aware obviously include any domain name
807807+ slot whose specification predates IDNA. Note that the requirements
808808+ of some protocols that use the DNS for data storage prevent the use
809809+ of IDNs. For example, the format required for the underscore labels
810810+ used by the service location protocol [RFC2782] precludes
811811+ representation of a non-ASCII label in the DNS using A-labels because
812812+ those SRV-related labels must start with underscores. Of course,
813813+ non-ASCII IDN labels may be part of a domain name that also includes
814814+ underscore labels.
815815+816816+2.3.3. Order of Characters in Labels
817817+818818+ Because IDN labels may contain characters that are read, and
819819+ preferentially displayed, from right to left, there is a potential
820820+ ambiguity about which character in a label is "first". For the
821821+ purposes of these specifications, labels are considered, and
822822+ characters numbered, strictly in the order in which they appear "on
823823+ the wire". That order is equivalent to the leftmost character being
824824+ treated as first in a label that is read left to right and to the
825825+ rightmost character being first in a label that is read right to
826826+ left. The Bidi specification contains additional discussion of the
827827+ conditions that influence reading order.
828828+829829+2.3.4. Punycode is an Algorithm, Not a Name or Adjective
830830+831831+ There has been some confusion about whether a "Punycode string" does
832832+ or does not include the ACE prefix and about whether it is required
833833+ that such strings could have been the output of the ToASCII operation
834834+ (see RFC 3490, Section 4 [RFC3490]). This specification discourages
835835+ the use of the term "Punycode" to describe anything but the encoding
836836+ method and algorithm of RFC 3492 [RFC3492]. The terms defined above
837837+ are preferred as much more clear than the term "Punycode string".
838838+839839+840840+841841+842842+Klensin Standards Track [Page 15]
843843+844844+RFC 5890 IDNA Definitions August 2010
845845+846846+847847+3. IANA Considerations
848848+849849+ IANA actions for this version of IDNA (IDNA2008) are specified in the
850850+ Tables document [RFC5892]. An overview of the relationships among
851851+ the various IANA registries appears in the Rationale document
852852+ [RFC5894]. This document does not specify any actions for IANA.
853853+854854+4. Security Considerations
855855+856856+4.1. General Issues
857857+858858+ Security on the Internet partly relies on the DNS. Thus, any change
859859+ to the characteristics of the DNS can change the security of much of
860860+ the Internet.
861861+862862+ Domain names are used by users to identify and connect to Internet
863863+ hosts and other network resources. The security of the Internet is
864864+ compromised if a user entering a single internationalized name is
865865+ connected to different servers based on different interpretations of
866866+ the internationalized domain name. In addition to characters that
867867+ are permitted by IDNA2003 and its mapping conventions (see
868868+ Section 4.6), the current specification changes the interpretation of
869869+ a few characters that were mapped to others in the earlier version;
870870+ zone administrators should be aware of the problems that this might
871871+ raise and take appropriate measures. The context for this issue is
872872+ discussed in more detail in the Rationale document [RFC5894].
873873+874874+ In addition to the Security Considerations material that appears in
875875+ this document, the Bidi document [RFC5893] contains a discussion of
876876+ security issues specific to labels containing characters from scripts
877877+ that are normally written right to left.
878878+879879+4.2. U-label Lengths
880880+881881+ Labels associated with the DNS have traditionally been limited to 63
882882+ octets by the general restrictions in RFC 1035 and by the need to
883883+ treat them as a six-bit string length followed by the string in
884884+ actual calls to the DNS. That format is used in some other
885885+ applications and, in general, that representations of domain names as
886886+ dot-separated labels and as length-string pairs have been treated as
887887+ interchangeable. Because A-labels (the form actually used in the
888888+ DNS) are potentially much more compressed than UTF-8 (and UTF-8 is,
889889+ in general, more compressed that UTF-16 or UTF-32), U-labels that
890890+ obey all of the relevant symmetry (and other) constraints of these
891891+ documents may be quite a bit longer, potentially up to 252 characters
892892+ (Unicode code points). A fully-qualified domain name containing
893893+ several such labels can obviously also exceed the nominal 255 octet
894894+895895+896896+897897+898898+Klensin Standards Track [Page 16]
899899+900900+RFC 5890 IDNA Definitions August 2010
901901+902902+903903+ limit for such names. Application authors using U-labels must exert
904904+ due caution to avoid buffer overflow and truncation errors and
905905+ attacks in contexts where shorter strings are expected.
906906+907907+4.3. Local Character Set Issues
908908+909909+ When systems use local character sets other than ASCII and Unicode,
910910+ these specifications leave the problem of converting between the
911911+ local character set and Unicode up to the application or local
912912+ system. If different applications (or different versions of one
913913+ application) implement different rules for conversions among coded
914914+ character sets, they could interpret the same name differently and
915915+ contact different servers. This problem is not solved by security
916916+ protocols, such as Transport Layer Security (TLS) [RFC5246], that do
917917+ not take local character sets into account.
918918+919919+4.4. Visually Similar Characters
920920+921921+ To help prevent confusion between characters that are visually
922922+ similar (sometimes called "confusables"), it is suggested that
923923+ implementations provide visual indications where a domain name
924924+ contains multiple scripts, especially when the scripts contain
925925+ characters that are easily confused visually, such as an omicron in
926926+ Greek mixed with Latin text. Such mechanisms can also be used to
927927+ show when a name contains a mixture of Simplified Chinese characters
928928+ with Traditional ones that have Simplified forms, or to distinguish
929929+ zero and one from uppercase "O" and lowercase "L". DNS zone
930930+ administrators may impose restrictions (subject to the limitations
931931+ identified elsewhere in these documents) that try to minimize
932932+ characters that have similar appearance or similar interpretations.
933933+934934+ If multiple characters appear in a label and the label consists only
935935+ of characters in one script, individual characters that might be
936936+ confused with others if compared separately may be unambiguous and
937937+ non-confusing. On the other hand, that observation makes labels
938938+ containing characters from more than one script (often called "mixed-
939939+ script labels") even more risky -- users will tend to see what they
940940+ expect to see and context is a powerful reinforcement to perception.
941941+ At the same time, while the risks associated with mixed-script labels
942942+ are clear, simply prohibiting them will not eliminate problems,
943943+ especially where closely related scripts are involved. For example,
944944+ there are many strings that are entirely in Greek or Cyrillic scripts
945945+ that can be confused with each other or with Latin script strings.
946946+947947+ It is worth noting that there are no comprehensive technical
948948+ solutions to the problems of confusable characters. One can reduce
949949+ the extent of the problems in various ways, but probably never
950950+951951+952952+953953+954954+Klensin Standards Track [Page 17]
955955+956956+RFC 5890 IDNA Definitions August 2010
957957+958958+959959+ eliminate it. Some specific suggestions about identification and
960960+ handling of confusable characters appear in a Unicode Consortium
961961+ publication [Unicode-UTR36].
962962+963963+4.5. IDNA Lookup, Registration, and the Base DNS Specifications
964964+965965+ The Protocol specification [RFC5891] describes procedures for
966966+ registering and looking up labels that are not compatible with the
967967+ preferred syntax described in the base DNS specifications (see
968968+ Section 2.3.1) because they contain non-ASCII characters. These
969969+ procedures depend on the use of a special ASCII-compatible encoding
970970+ form that contains only characters permitted in hostnames by those
971971+ earlier specifications. The encoding used is Punycode [RFC3492]. No
972972+ security issues such as string length increases or new allowed values
973973+ are introduced by the encoding process or the use of these encoded
974974+ values, apart from those introduced by the ACE encoding itself.
975975+976976+ Domain names (or portions of them) are sometimes compared against a
977977+ set of domains to be given special treatment if a match occurs, e.g.,
978978+ treated as more privileged than others or blocked in some way. In
979979+ such situations, it is especially important that the comparisons be
980980+ done properly, as specified in the "Requirements" section of the
981981+ Protocol document [RFC5891]. For labels already in ASCII form, the
982982+ proper comparison reduces to the same case-insensitive ASCII
983983+ comparison that has always been used for ASCII labels although
984984+ IDNA-aware applications are expected to look up only A-labels and
985985+ NR-LDH labels, i.e., to avoid looking up R-LDH labels that are not
986986+ A-labels.
987987+988988+ The introduction of IDNA meant that any existing labels that start
989989+ with the ACE prefix would be construed as A-labels, at least until
990990+ they failed one of the relevant tests, whether or not that was the
991991+ intent of the zone administrator or registrant. There is no evidence
992992+ that this has caused any practical problems since RFC 3490 was
993993+ adopted, but the risk still exists in principle.
994994+995995+4.6. Legacy IDN Label Strings
996996+997997+ The URI Standard [RFC3986] and a number of application specifications
998998+ (e.g., SMTP [RFC5321] and HTTP [RFC2616]) do not permit non-ASCII
999999+ labels in DNS names used with those protocols, i.e., only the A-label
10001000+ form of IDNs is permitted in those contexts. If only A-labels are
10011001+ used, differences in interpretation between IDNA2003 and this version
10021002+ arise only for characters whose interpretation have actually changed
10031003+ (e.g., characters, such as ZWJ and ZWNJ, that were mapped to nothing
10041004+ in IDNA2003 and that are considered legitimate in some contexts by
10051005+ these specifications). Despite that prohibition, there are a
10061006+ significant number of files and databases on the Internet in which
10071007+10081008+10091009+10101010+Klensin Standards Track [Page 18]
10111011+10121012+RFC 5890 IDNA Definitions August 2010
10131013+10141014+10151015+ domain name strings appear in native-character form; a subset of
10161016+ those strings use native-character labels that require IDNA2003
10171017+ mapping to produce valid A-labels. The treatment of such labels will
10181018+ vary by types of applications and application-designer preference: in
10191019+ some situations, warnings to the user or outright rejection may be
10201020+ appropriate; in others, it may be preferable to attempt to apply the
10211021+ earlier mappings if lookup strictly conformant to these
10221022+ specifications fails or even to do lookups under both sets of rules.
10231023+ This general situation is discussed in more detail in the Rationale
10241024+ document [RFC5894]. However, in the absence of care by registries
10251025+ about how strings that could have different interpretations under
10261026+ IDNA2003 and the current specification are handled, it is possible
10271027+ that the differences could be used as a component of name-matching or
10281028+ name-confusion attacks. Such care is therefore appropriate.
10291029+10301030+4.7. Security Differences from IDNA2003
10311031+10321032+ The registration and lookup models described in this set of documents
10331033+ change the mechanisms available for lookup applications to determine
10341034+ the validity of labels they encounter. In some respects, the ability
10351035+ to test is strengthened. For example, putative labels that contain
10361036+ unassigned code points will now be rejected, while IDNA2003 permitted
10371037+ them (see the Rationale document [RFC5894] for a discussion of the
10381038+ reasons for this). On the other hand, the Protocol specification no
10391039+ longer assumes that the application that looks up a name will be able
10401040+ to determine, and apply, information about the protocol version used
10411041+ in registration. In theory, that may increase risk since the
10421042+ application will be able to do less pre-lookup validation. In
10431043+ practice, the protection afforded by that test has been largely
10441044+ illusory for reasons explained in RFC 4690 [RFC4690] and elsewhere in
10451045+ these documents.
10461046+10471047+ Any change to the Stringprep [RFC3454] procedure that is profiled and
10481048+ used in IDNA2003, or, more broadly, the IETF's model of the use of
10491049+ internationalized character strings in different protocols, creates
10501050+ some risk of inadvertent changes to those protocols, invalidating
10511051+ deployed applications or databases, and so on. But these
10521052+ specifications do not change Stringprep at all; they merely bypass
10531053+ it. Because these documents do not depend on Stringprep, the
10541054+ question of upgrading other protocols that do have that dependency
10551055+ can be left to experts on those protocols: the IDNA changes and
10561056+ possible upgrades to security protocols or conventions are
10571057+ independent issues.
10581058+10591059+10601060+10611061+10621062+10631063+10641064+10651065+10661066+Klensin Standards Track [Page 19]
10671067+10681068+RFC 5890 IDNA Definitions August 2010
10691069+10701070+10711071+4.8. Summary
10721072+10731073+ No mechanism involving names or identifiers alone can protect against
10741074+ a wide variety of security threats and attacks that are largely
10751075+ independent of the naming or identification system. These attacks
10761076+ include spoofed pages, DNS query trapping and diversion, and so on.
10771077+10781078+5. Acknowledgments
10791079+10801080+ The initial version of this document was created largely by
10811081+ extracting text from early draft versions of the Rationale document
10821082+ [RFC5894]. See the section of this name and the one entitled
10831083+ "Contributors", in it.
10841084+10851085+ Specific textual suggestions after the extraction process came from
10861086+ Vint Cerf, Lisa Dusseault, Bill McQuillan, Andrew Sullivan, and Ken
10871087+ Whistler. Other changes were made in response to more general
10881088+ comments, lists of concerns or specific errors from participants in
10891089+ the Working Group and other observers, including Lyman Chapin, James
10901090+ Mitchell, Subramanian Moonesamy, and Dan Winship.
10911091+10921092+6. References
10931093+10941094+6.1. Normative References
10951095+10961096+ [ASCII] American National Standards Institute (formerly United
10971097+ States of America Standards Institute), "USA Code for
10981098+ Information Interchange", ANSI X3.4-1968, 1968. ANSI
10991099+ X3.4-1968 has been replaced by newer versions with
11001100+ slight modifications, but the 1968 version remains
11011101+ definitive for the Internet.
11021102+11031103+ [RFC1034] Mockapetris, P., "Domain names - concepts and
11041104+ facilities", STD 13, RFC 1034, November 1987.
11051105+11061106+ [RFC1035] Mockapetris, P., "Domain names - implementation and
11071107+ specification", STD 13, RFC 1035, November 1987.
11081108+11091109+ [RFC1123] Braden, R., "Requirements for Internet Hosts -
11101110+ Application and Support", STD 3, RFC 1123, October 1989.
11111111+11121112+ [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
11131113+ Requirement Levels", BCP 14, RFC 2119, March 1997.
11141114+11151115+11161116+11171117+11181118+11191119+11201120+11211121+11221122+Klensin Standards Track [Page 20]
11231123+11241124+RFC 5890 IDNA Definitions August 2010
11251125+11261126+11271127+ [Unicode-UAX15]
11281128+ The Unicode Consortium, "Unicode Standard Annex #15:
11291129+ Unicode Normalization Forms, Revision 31",
11301130+ September 2009,
11311131+ <http://www.unicode.org/reports/tr15/tr15-31.html>.
11321132+11331133+ [Unicode52] The Unicode Consortium. The Unicode Standard, Version
11341134+ 5.2.0, defined by: "The Unicode Standard, Version
11351135+ 5.2.0", (Mountain View, CA: The Unicode Consortium,
11361136+ 2009. ISBN 978-1-936213-00-9).
11371137+ <http://www.unicode.org/versions/Unicode5.2.0/>.
11381138+11391139+6.2. Informative References
11401140+11411141+ [IDNA2008-Mapping]
11421142+ Resnick, P. and P. Hoffman, "Mapping Characters in
11431143+ Internationalized Domain Names for Applications (IDNA)",
11441144+ Work in Progress, April 2010.
11451145+11461146+ [RFC0952] Harrenstien, K., Stahl, M., and E. Feinler, "DoD
11471147+ Internet host table specification", RFC 952,
11481148+ October 1985.
11491149+11501150+ [RFC2181] Elz, R. and R. Bush, "Clarifications to the DNS
11511151+ Specification", RFC 2181, July 1997.
11521152+11531153+ [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H.,
11541154+ Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext
11551155+ Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999.
11561156+11571157+ [RFC2673] Crawford, M., "Binary Labels in the Domain Name System",
11581158+ RFC 2673, August 1999.
11591159+11601160+ [RFC2782] Gulbrandsen, A., Vixie, P., and L. Esibov, "A DNS RR for
11611161+ specifying the location of services (DNS SRV)",
11621162+ RFC 2782, February 2000.
11631163+11641164+ [RFC3454] Hoffman, P. and M. Blanchet, "Preparation of
11651165+ Internationalized Strings ("stringprep")", RFC 3454,
11661166+ December 2002.
11671167+11681168+ [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello,
11691169+ "Internationalizing Domain Names in Applications
11701170+ (IDNA)", RFC 3490, March 2003.
11711171+11721172+ [RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep
11731173+ Profile for Internationalized Domain Names (IDN)",
11741174+ RFC 3491, March 2003.
11751175+11761176+11771177+11781178+Klensin Standards Track [Page 21]
11791179+11801180+RFC 5890 IDNA Definitions August 2010
11811181+11821182+11831183+ [RFC3492] Costello, A., "Punycode: A Bootstring encoding of
11841184+ Unicode for Internationalized Domain Names in
11851185+ Applications (IDNA)", RFC 3492, March 2003.
11861186+11871187+ [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
11881188+ Resource Identifier (URI): Generic Syntax", STD 66,
11891189+ RFC 3986, January 2005.
11901190+11911191+ [RFC4690] Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review
11921192+ and Recommendations for Internationalized Domain Names
11931193+ (IDNs)", RFC 4690, September 2006.
11941194+11951195+ [RFC5246] Dierks, T. and E. Rescorla, "The Transport Layer
11961196+ Security (TLS) Protocol Version 1.2", RFC 5246,
11971197+ August 2008.
11981198+11991199+ [RFC5321] Klensin, J., "Simple Mail Transfer Protocol", RFC 5321,
12001200+ October 2008.
12011201+12021202+ [RFC5891] Klensin, J., "Internationalized Domain Names in
12031203+ Applications (IDNA): Protocol", RFC 5891, August 2010.
12041204+12051205+ [RFC5892] Faltstrom, P., Ed., "The Unicode Code Points and
12061206+ Internationalized Domain Names for Applications (IDNA)",
12071207+ RFC 5892, August 2010.
12081208+12091209+ [RFC5893] Alvestrand, H. and C. Karp, "Right-to-Left Scripts for
12101210+ Internationalized Domain Names for Applications (IDNA)",
12111211+ RFC 5893, August 2010.
12121212+12131213+ [RFC5894] Klensin, J., "Internationalized Domain Names for
12141214+ Applications (IDNA): Background, Explanation, and
12151215+ Rationale", RFC 5894, August 2010.
12161216+12171217+ [Unicode-UTR36]
12181218+ The Unicode Consortium, "Unicode Technical Report #36:
12191219+ Unicode Security Considerations, Revision 7", July 2008,
12201220+ <http://www.unicode.org/reports/tr36/tr36-7.html>.
12211221+12221222+12231223+12241224+12251225+12261226+12271227+12281228+12291229+12301230+12311231+12321232+12331233+12341234+Klensin Standards Track [Page 22]
12351235+12361236+RFC 5890 IDNA Definitions August 2010
12371237+12381238+12391239+Author's Address
12401240+12411241+ John C Klensin
12421242+ 1770 Massachusetts Ave, Ste 322
12431243+ Cambridge, MA 02140
12441244+ USA
12451245+12461246+ Phone: +1 617 245 1457
12471247+ EMail: john+ietf@jck.com
12481248+12491249+12501250+12511251+12521252+12531253+12541254+12551255+12561256+12571257+12581258+12591259+12601260+12611261+12621262+12631263+12641264+12651265+12661266+12671267+12681268+12691269+12701270+12711271+12721272+12731273+12741274+12751275+12761276+12771277+12781278+12791279+12801280+12811281+12821282+12831283+12841284+12851285+12861286+12871287+12881288+12891289+12901290+Klensin Standards Track [Page 23]
12911291+
+955
spec/rfc5891.txt
···11+22+33+44+55+66+77+Internet Engineering Task Force (IETF) J. Klensin
88+Request for Comments: 5891 August 2010
99+Obsoletes: 3490, 3491
1010+Updates: 3492
1111+Category: Standards Track
1212+ISSN: 2070-1721
1313+1414+1515+ Internationalized Domain Names in Applications (IDNA): Protocol
1616+1717+Abstract
1818+1919+ This document is the revised protocol definition for
2020+ Internationalized Domain Names (IDNs). The rationale for changes,
2121+ the relationship to the older specification, and important
2222+ terminology are provided in other documents. This document specifies
2323+ the protocol mechanism, called Internationalized Domain Names in
2424+ Applications (IDNA), for registering and looking up IDNs in a way
2525+ that does not require changes to the DNS itself. IDNA is only meant
2626+ for processing domain names, not free text.
2727+2828+Status of This Memo
2929+3030+ This is an Internet Standards Track document.
3131+3232+ This document is a product of the Internet Engineering Task Force
3333+ (IETF). It represents the consensus of the IETF community. It has
3434+ received public review and has been approved for publication by the
3535+ Internet Engineering Steering Group (IESG). Further information on
3636+ Internet Standards is available in Section 2 of RFC 5741.
3737+3838+ Information about the current status of this document, any errata,
3939+ and how to provide feedback on it may be obtained at
4040+ http://www.rfc-editor.org/info/rfc5891.
4141+4242+4343+4444+4545+4646+4747+4848+4949+5050+5151+5252+5353+5454+5555+5656+5757+5858+Klensin Standards Track [Page 1]
5959+6060+RFC 5891 IDNA2008 Protocol August 2010
6161+6262+6363+Copyright Notice
6464+6565+ Copyright (c) 2010 IETF Trust and the persons identified as the
6666+ document authors. All rights reserved.
6767+6868+ This document is subject to BCP 78 and the IETF Trust's Legal
6969+ Provisions Relating to IETF Documents
7070+ (http://trustee.ietf.org/license-info) in effect on the date of
7171+ publication of this document. Please review these documents
7272+ carefully, as they describe your rights and restrictions with respect
7373+ to this document. Code Components extracted from this document must
7474+ include Simplified BSD License text as described in Section 4.e of
7575+ the Trust Legal Provisions and are provided without warranty as
7676+ described in the Simplified BSD License.
7777+7878+ This document may contain material from IETF Documents or IETF
7979+ Contributions published or made publicly available before November
8080+ 10, 2008. The person(s) controlling the copyright in some of this
8181+ material may not have granted the IETF Trust the right to allow
8282+ modifications of such material outside the IETF Standards Process.
8383+ Without obtaining an adequate license from the person(s) controlling
8484+ the copyright in such materials, this document may not be modified
8585+ outside the IETF Standards Process, and derivative works of it may
8686+ not be created outside the IETF Standards Process, except to format
8787+ it for publication as an RFC or to translate it into languages other
8888+ than English.
8989+9090+9191+9292+9393+9494+9595+9696+9797+9898+9999+100100+101101+102102+103103+104104+105105+106106+107107+108108+109109+110110+111111+112112+113113+114114+Klensin Standards Track [Page 2]
115115+116116+RFC 5891 IDNA2008 Protocol August 2010
117117+118118+119119+Table of Contents
120120+121121+ 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
122122+ 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4
123123+ 3. Requirements and Applicability . . . . . . . . . . . . . . . . 5
124124+ 3.1. Requirements . . . . . . . . . . . . . . . . . . . . . . . 5
125125+ 3.2. Applicability . . . . . . . . . . . . . . . . . . . . . . 5
126126+ 3.2.1. DNS Resource Records . . . . . . . . . . . . . . . . . 6
127127+ 3.2.2. Non-Domain-Name Data Types Stored in the DNS . . . . . 6
128128+ 4. Registration Protocol . . . . . . . . . . . . . . . . . . . . 6
129129+ 4.1. Input to IDNA Registration . . . . . . . . . . . . . . . . 7
130130+ 4.2. Permitted Character and Label Validation . . . . . . . . . 7
131131+ 4.2.1. Input Format . . . . . . . . . . . . . . . . . . . . . 7
132132+ 4.2.2. Rejection of Characters That Are Not Permitted . . . . 8
133133+ 4.2.3. Label Validation . . . . . . . . . . . . . . . . . . . 8
134134+ 4.2.4. Registration Validation Requirements . . . . . . . . . 9
135135+ 4.3. Registry Restrictions . . . . . . . . . . . . . . . . . . 9
136136+ 4.4. Punycode Conversion . . . . . . . . . . . . . . . . . . . 9
137137+ 4.5. Insertion in the Zone . . . . . . . . . . . . . . . . . . 10
138138+ 5. Domain Name Lookup Protocol . . . . . . . . . . . . . . . . . 10
139139+ 5.1. Label String Input . . . . . . . . . . . . . . . . . . . . 10
140140+ 5.2. Conversion to Unicode . . . . . . . . . . . . . . . . . . 10
141141+ 5.3. A-label Input . . . . . . . . . . . . . . . . . . . . . . 10
142142+ 5.4. Validation and Character List Testing . . . . . . . . . . 11
143143+ 5.5. Punycode Conversion . . . . . . . . . . . . . . . . . . . 13
144144+ 5.6. DNS Name Resolution . . . . . . . . . . . . . . . . . . . 13
145145+ 6. Security Considerations . . . . . . . . . . . . . . . . . . . 13
146146+ 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13
147147+ 8. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 13
148148+ 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 14
149149+ 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 14
150150+ 10.1. Normative References . . . . . . . . . . . . . . . . . . . 14
151151+ 10.2. Informative References . . . . . . . . . . . . . . . . . . 15
152152+ Appendix A. Summary of Major Changes from IDNA2003 . . . . . . . 17
153153+154154+155155+156156+157157+158158+159159+160160+161161+162162+163163+164164+165165+166166+167167+168168+169169+170170+Klensin Standards Track [Page 3]
171171+172172+RFC 5891 IDNA2008 Protocol August 2010
173173+174174+175175+1. Introduction
176176+177177+ This document supplies the protocol definition for Internationalized
178178+ Domain Names in Applications (IDNA), with the version specified here
179179+ known as IDNA2008. Essential definitions and terminology for
180180+ understanding this document and a road map of the collection of
181181+ documents that make up IDNA2008 appear in a separate Definitions
182182+ document [RFC5890]. Appendix A discusses the relationship between
183183+ this specification and the earlier version of IDNA (referred to here
184184+ as "IDNA2003"). The rationale for these changes, along with
185185+ considerable explanatory material and advice to zone administrators
186186+ who support IDNs, is provided in another document, known informally
187187+ in this series as the "Rationale document" [RFC5894].
188188+189189+ IDNA works by allowing applications to use certain ASCII [ASCII]
190190+ string labels (beginning with a special prefix) to represent
191191+ non-ASCII name labels. Lower-layer protocols need not be aware of
192192+ this; therefore, IDNA does not change any infrastructure. In
193193+ particular, IDNA does not depend on any changes to DNS servers,
194194+ resolvers, or DNS protocol elements, because the ASCII name service
195195+ provided by the existing DNS can be used for IDNA.
196196+197197+ IDNA applies only to a specific subset of DNS labels. The base DNS
198198+ standards [RFC1034] [RFC1035] and their various updates specify how
199199+ to combine labels into fully-qualified domain names and parse labels
200200+ out of those names.
201201+202202+ This document describes two separate protocols, one for IDN
203203+ registration (Section 4) and one for IDN lookup (Section 5). These
204204+ two protocols share some terminology, reference data, and operations.
205205+206206+2. Terminology
207207+208208+ As mentioned above, terminology used as part of the definition of
209209+ IDNA appears in the Definitions document [RFC5890]. It is worth
210210+ noting that some of this terminology overlaps with, and is consistent
211211+ with, that used in Unicode or other character set standards and the
212212+ DNS. Readers of this document are assumed to be familiar with the
213213+ associated Definitions document and with the DNS-specific terminology
214214+ in RFC 1034 [RFC1034].
215215+216216+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
217217+ "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
218218+ document are to be interpreted as described in BCP 14, RFC 2119
219219+ [RFC2119].
220220+221221+222222+223223+224224+225225+226226+Klensin Standards Track [Page 4]
227227+228228+RFC 5891 IDNA2008 Protocol August 2010
229229+230230+231231+3. Requirements and Applicability
232232+233233+3.1. Requirements
234234+235235+ IDNA makes the following requirements:
236236+237237+ 1. Whenever a domain name is put into a domain name slot that is not
238238+ IDNA-aware (see Section 2.3.2.6 of the Definitions document
239239+ [RFC5890]), it MUST contain only ASCII characters (i.e., its
240240+ labels must be either A-labels or NR-LDH labels), unless the DNS
241241+ application is not subject to historical recommendations for
242242+ "hostname"-style names (see RFC 1034 [RFC1034] and
243243+ Section 3.2.1).
244244+245245+ 2. Labels MUST be compared using equivalent forms: either both
246246+ A-label forms or both U-label forms. Because A-labels and
247247+ U-labels can be transformed into each other without loss of
248248+ information, these comparisons are equivalent (however, in
249249+ practice, comparison of U-labels requires first verifying that
250250+ they actually are U-labels and not just Unicode strings). A pair
251251+ of A-labels MUST be compared as case-insensitive ASCII (as with
252252+ all comparisons of ASCII DNS labels). U-labels MUST be compared
253253+ as-is, without case folding or other intermediate steps. While
254254+ it is not necessary to validate labels in order to compare them,
255255+ successful comparison does not imply validity. In many cases,
256256+ not limited to comparison, validation may be important for other
257257+ reasons and SHOULD be performed.
258258+259259+ 3. Labels being registered MUST conform to the requirements of
260260+ Section 4. Labels being looked up and the lookup process MUST
261261+ conform to the requirements of Section 5.
262262+263263+3.2. Applicability
264264+265265+ IDNA applies to all domain names in all domain name slots in
266266+ protocols except where it is explicitly excluded. It does not apply
267267+ to domain name slots that do not use the LDH syntax rules as
268268+ described in the Definitions document [RFC5890].
269269+270270+ Because it uses the DNS, IDNA applies to many protocols that were
271271+ specified before it was designed. IDNs occupying domain name slots
272272+ in those older protocols MUST be in A-label form until and unless
273273+ those protocols and their implementations are explicitly upgraded to
274274+ be aware of IDNs and to accept the U-label form. IDNs actually
275275+ appearing in DNS queries or responses MUST be A-labels.
276276+277277+278278+279279+280280+281281+282282+Klensin Standards Track [Page 5]
283283+284284+RFC 5891 IDNA2008 Protocol August 2010
285285+286286+287287+ IDNA-aware protocols and implementations MAY accept U-labels,
288288+ A-labels, or both as those particular protocols specify. IDNA is not
289289+ defined for extended label types (see RFC 2671 [RFC2671], Section 3).
290290+291291+3.2.1. DNS Resource Records
292292+293293+ IDNA applies only to domain names in the NAME and RDATA fields of DNS
294294+ resource records whose CLASS is IN. See the DNS specification
295295+ [RFC1035] for precise definitions of these terms.
296296+297297+ The application of IDNA to DNS resource records depends entirely on
298298+ the CLASS of the record, and not on the TYPE except as noted below.
299299+ This will remain true, even as new TYPEs are defined, unless a new
300300+ TYPE defines TYPE-specific rules. Special naming conventions for SRV
301301+ records (and "underscore labels" more generally) are incompatible
302302+ with IDNA coding as discussed in the Definitions document [RFC5890],
303303+ especially Section 2.3.2.3. Of course, underscore labels may be part
304304+ of a domain that uses IDN labels at higher levels in the tree.
305305+306306+3.2.2. Non-Domain-Name Data Types Stored in the DNS
307307+308308+ Although IDNA enables the representation of non-ASCII characters in
309309+ domain names, that does not imply that IDNA enables the
310310+ representation of non-ASCII characters in other data types that are
311311+ stored in domain names, specifically in the RDATA field for types
312312+ that have structured RDATA format. For example, an email address
313313+ local part is stored in a domain name in the RNAME field as part of
314314+ the RDATA of an SOA record (e.g., hostmaster@example.com would be
315315+ represented as hostmaster.example.com). IDNA does not update the
316316+ existing email standards, which allow only ASCII characters in local
317317+ parts. Even though work is in progress to define
318318+ internationalization for email addresses [RFC4952], changes to the
319319+ email address part of the SOA RDATA would require action in, or
320320+ updates to, other standards, specifically those that specify the
321321+ format of the SOA RR.
322322+323323+4. Registration Protocol
324324+325325+ This section defines the model for registering an IDN. The model is
326326+ implementation independent; any sequence of steps that produces
327327+ exactly the same result for all labels is considered a valid
328328+ implementation.
329329+330330+ Note that, while the registration (this section) and lookup protocols
331331+ (Section 5) are very similar in most respects, they are not
332332+ identical, and implementers should carefully follow the steps
333333+ described in this specification.
334334+335335+336336+337337+338338+Klensin Standards Track [Page 6]
339339+340340+RFC 5891 IDNA2008 Protocol August 2010
341341+342342+343343+4.1. Input to IDNA Registration
344344+345345+ Registration processes, especially processing by entities (often
346346+ called "registrars") who deal with registrants before the request
347347+ actually reaches the zone manager ("registry") are outside the scope
348348+ of this definition and may differ significantly depending on local
349349+ needs. By the time a string enters the IDNA registration process as
350350+ described in this specification, it MUST be in Unicode and in
351351+ Normalization Form C (NFC [Unicode-UAX15]). Entities responsible for
352352+ zone files ("registries") MUST accept only the exact string for which
353353+ registration is requested, free of any mappings or local adjustments.
354354+ They MAY accept that input in any of three forms:
355355+356356+ 1. As a pair of A-label and U-label.
357357+358358+ 2. As an A-label only.
359359+360360+ 3. As a U-label only.
361361+362362+ The first two of these forms are RECOMMENDED because the use of
363363+ A-labels avoids any possibility of ambiguity. The first is normally
364364+ preferred over the second because it permits further verification of
365365+ user intent (see Section 4.2.1).
366366+367367+4.2. Permitted Character and Label Validation
368368+369369+4.2.1. Input Format
370370+371371+ If both the U-label and A-label forms are available, the registry
372372+ MUST ensure that the A-label form is in lowercase, perform a
373373+ conversion to a U-label, perform the steps and tests described below
374374+ on that U-label, and then verify that the A-label produced by the
375375+ step in Section 4.4 matches the one provided as input. In addition,
376376+ the U-label that was provided as input and the one obtained by
377377+ conversion of the A-label MUST match exactly. If, for some reason,
378378+ these tests fail, the registration MUST be rejected.
379379+380380+ If only an A-label was provided and the conversion to a U-label is
381381+ not performed, the registry MUST still verify that the A-label is
382382+ superficially valid, i.e., that it does not violate any of the rules
383383+ of Punycode encoding [RFC3492] such as the prohibition on trailing
384384+ hyphen-minus, the requirement that all characters be ASCII, and so
385385+ on. Strings that appear to be A-labels (e.g., they start with
386386+ "xn--") and strings that are supplied to the registry in a context
387387+ reserved for A-labels (such as a field in a form to be filled out),
388388+ but that are not valid A-labels as described in this paragraph, MUST
389389+ NOT be placed in DNS zones that support IDNA.
390390+391391+392392+393393+394394+Klensin Standards Track [Page 7]
395395+396396+RFC 5891 IDNA2008 Protocol August 2010
397397+398398+399399+ If only an A-label is provided, the conversion to a U-label is not
400400+ performed, but the superficial tests described in the previous
401401+ paragraph are performed, registration procedures MAY, and usually
402402+ will, bypass the tests and actions in the balance of Section 4.2 and
403403+ in Sections 4.3 and 4.4.
404404+405405+4.2.2. Rejection of Characters That Are Not Permitted
406406+407407+ The candidate Unicode string MUST NOT contain characters that appear
408408+ in the "DISALLOWED" and "UNASSIGNED" lists specified in the Tables
409409+ document [RFC5892].
410410+411411+4.2.3. Label Validation
412412+413413+ The proposed label (in the form of a Unicode string, i.e., a string
414414+ that at least superficially appears to be a U-label) is then examined
415415+ using tests that require examination of more than one character.
416416+ Character order is considered to be the on-the-wire order. That
417417+ order may not be the same as the display order.
418418+419419+4.2.3.1. Hyphen Restrictions
420420+421421+ The Unicode string MUST NOT contain "--" (two consecutive hyphens) in
422422+ the third and fourth character positions and MUST NOT start or end
423423+ with a "-" (hyphen).
424424+425425+4.2.3.2. Leading Combining Marks
426426+427427+ The Unicode string MUST NOT begin with a combining mark or combining
428428+ character (see The Unicode Standard, Section 2.11 [Unicode] for an
429429+ exact definition).
430430+431431+4.2.3.3. Contextual Rules
432432+433433+ The Unicode string MUST NOT contain any characters whose validity is
434434+ context-dependent, unless the validity is positively confirmed by a
435435+ contextual rule. To check this, each code point identified as
436436+ CONTEXTJ or CONTEXTO in the Tables document [RFC5892] MUST have a
437437+ non-null rule. If such a code point is missing a rule, the label is
438438+ invalid. If the rule exists but the result of applying the rule is
439439+ negative or inconclusive, the proposed label is invalid.
440440+441441+4.2.3.4. Labels Containing Characters Written Right to Left
442442+443443+ If the proposed label contains any characters from scripts that are
444444+ written from right to left, it MUST meet the Bidi criteria [RFC5893].
445445+446446+447447+448448+449449+450450+Klensin Standards Track [Page 8]
451451+452452+RFC 5891 IDNA2008 Protocol August 2010
453453+454454+455455+4.2.4. Registration Validation Requirements
456456+457457+ Strings that contain at least one non-ASCII character, have been
458458+ produced by the steps above, whose contents pass all of the tests in
459459+ Section 4.2.3, and are 63 or fewer characters long in
460460+ ASCII-compatible encoding (ACE) form (see Section 4.4), are U-labels.
461461+462462+ To summarize, tests are made in Section 4.2 for invalid characters,
463463+ invalid combinations of characters, for labels that are invalid even
464464+ if the characters they contain are valid individually, and for labels
465465+ that do not conform to the restrictions for strings containing
466466+ right-to-left characters.
467467+468468+4.3. Registry Restrictions
469469+470470+ In addition to the rules and tests above, there are many reasons why
471471+ a registry could reject a label. Registries at all levels of the
472472+ DNS, not just the top level, are expected to establish policies about
473473+ label registrations. Policies are likely to be informed by the local
474474+ languages and the scripts that are used to write them and may depend
475475+ on many factors including what characters are in the label (for
476476+ example, a label may be rejected based on other labels already
477477+ registered). See the Rationale document [RFC5894], Section 3.2, for
478478+ further discussion and recommendations about registry policies.
479479+480480+ The string produced by the steps in Section 4.2 is checked and
481481+ processed as appropriate to local registry restrictions. Application
482482+ of those registry restrictions may result in the rejection of some
483483+ labels or the application of special restrictions to others.
484484+485485+4.4. Punycode Conversion
486486+487487+ The resulting U-label is converted to an A-label (defined in Section
488488+ 2.3.2.1 of the Definitions document [RFC5890]). The A-label is the
489489+ encoding of the U-label according to the Punycode algorithm [RFC3492]
490490+ with the ACE prefix "xn--" added at the beginning of the string. The
491491+ resulting string must, of course, conform to the length limits
492492+ imposed by the DNS. This document does not update or alter the
493493+ Punycode algorithm specified in RFC 3492 in any way. RFC 3492 does
494494+ make a non-normative reference to the information about the value and
495495+ construction of the ACE prefix that appears in RFC 3490 or Nameprep
496496+ [RFC3491]. For consistency and reader convenience, IDNA2008
497497+ effectively updates that reference to point to this document. That
498498+ change does not alter the prefix itself. The prefix, "xn--", is the
499499+ same in both sets of documents.
500500+501501+502502+503503+504504+505505+506506+Klensin Standards Track [Page 9]
507507+508508+RFC 5891 IDNA2008 Protocol August 2010
509509+510510+511511+ With the exception of the maximum string length test on Punycode
512512+ output, the failure conditions identified in the Punycode encoding
513513+ procedure cannot occur if the input is a U-label as determined by the
514514+ steps in Sections 4.1 through 4.3 above.
515515+516516+4.5. Insertion in the Zone
517517+518518+ The label is registered in the DNS by inserting the A-label into a
519519+ zone.
520520+521521+5. Domain Name Lookup Protocol
522522+523523+ Lookup is different from registration and different tests are applied
524524+ on the client. Although some validity checks are necessary to avoid
525525+ serious problems with the protocol, the lookup-side tests are more
526526+ permissive and rely on the assumption that names that are present in
527527+ the DNS are valid. That assumption is, however, a weak one because
528528+ the presence of wildcards in the DNS might cause a string that is not
529529+ actually registered in the DNS to be successfully looked up.
530530+531531+5.1. Label String Input
532532+533533+ The user supplies a string in the local character set, for example,
534534+ by typing it, clicking on it, or copying and pasting it from a
535535+ resource identifier, e.g., a Uniform Resource Identifier (URI)
536536+ [RFC3986] or an Internationalized Resource Identifier (IRI)
537537+ [RFC3987], from which the domain name is extracted. Alternately,
538538+ some process not directly involving the user may read the string from
539539+ a file or obtain it in some other way. Processing in this step and
540540+ the one specified in Section 5.2 are local matters, to be
541541+ accomplished prior to actual invocation of IDNA.
542542+543543+5.2. Conversion to Unicode
544544+545545+ The string is converted from the local character set into Unicode, if
546546+ it is not already in Unicode. Depending on local needs, this
547547+ conversion may involve mapping some characters into other characters
548548+ as well as coding conversions. Those issues are discussed in the
549549+ mapping-related sections (Sections 4.2, 4.4, 6, and 7.3) of the
550550+ Rationale document [RFC5894] and in the separate Mapping document
551551+ [IDNA2008-Mapping]. The result MUST be a Unicode string in NFC form.
552552+553553+5.3. A-label Input
554554+555555+ If the input to this procedure appears to be an A-label (i.e., it
556556+ starts in "xn--", interpreted case-insensitively), the lookup
557557+ application MAY attempt to convert it to a U-label, first ensuring
558558+ that the A-label is entirely in lowercase (converting it to lowercase
559559+560560+561561+562562+Klensin Standards Track [Page 10]
563563+564564+RFC 5891 IDNA2008 Protocol August 2010
565565+566566+567567+ if necessary), and apply the tests of Section 5.4 and the conversion
568568+ of Section 5.5 to that form. If the label is converted to Unicode
569569+ (i.e., to U-label form) using the Punycode decoding algorithm, then
570570+ the processing specified in those two sections MUST be performed, and
571571+ the label MUST be rejected if the resulting label is not identical to
572572+ the original. See Section 8.1 of the Rationale document [RFC5894]
573573+ for additional discussion on this topic.
574574+575575+ Conversion from the A-label and testing that the result is a U-label
576576+ SHOULD be performed if the domain name will later be presented to the
577577+ user in native character form (this requires that the lookup
578578+ application be IDNA-aware). If those steps are not performed, the
579579+ lookup process SHOULD at least test to determine that the string is
580580+ actually an A-label, examining it for the invalid formats specified
581581+ in the Punycode decoding specification. Applications that are not
582582+ IDNA-aware will obviously omit that testing; others MAY treat the
583583+ string as opaque to avoid the additional processing at the expense of
584584+ providing less protection and information to users.
585585+586586+5.4. Validation and Character List Testing
587587+588588+ As with the registration procedure described in Section 4, the
589589+ Unicode string is checked to verify that all characters that appear
590590+ in it are valid as input to IDNA lookup processing. As discussed
591591+ above and in the Rationale document [RFC5894], the lookup check is
592592+ more liberal than the registration one. Labels that have not been
593593+ fully evaluated for conformance to the applicable rules are referred
594594+ to as "putative" labels as discussed in Section 2.3.2.1 of the
595595+ Definitions document [RFC5890]. Putative U-labels with any of the
596596+ following characteristics MUST be rejected prior to DNS lookup:
597597+598598+ o Labels that are not in NFC [Unicode-UAX15].
599599+600600+ o Labels containing "--" (two consecutive hyphens) in the third and
601601+ fourth character positions.
602602+603603+ o Labels whose first character is a combining mark (see The Unicode
604604+ Standard, Section 2.11 [Unicode]).
605605+606606+ o Labels containing prohibited code points, i.e., those that are
607607+ assigned to the "DISALLOWED" category of the Tables document
608608+ [RFC5892].
609609+610610+ o Labels containing code points that are identified in the Tables
611611+ document as "CONTEXTJ", i.e., requiring exceptional contextual
612612+ rule processing on lookup, but that do not conform to those rules.
613613+ Note that this implies that a rule must be defined, not null: a
614614+615615+616616+617617+618618+Klensin Standards Track [Page 11]
619619+620620+RFC 5891 IDNA2008 Protocol August 2010
621621+622622+623623+ character that requires a contextual rule but for which the rule
624624+ is null is treated in this step as having failed to conform to the
625625+ rule.
626626+627627+ o Labels containing code points that are identified in the Tables
628628+ document as "CONTEXTO", but for which no such rule appears in the
629629+ table of rules. Applications resolving DNS names or carrying out
630630+ equivalent operations are not required to test contextual rules
631631+ for "CONTEXTO" characters, only to verify that a rule is defined
632632+ (although they MAY make such tests to provide better protection or
633633+ give better information to the user).
634634+635635+ o Labels containing code points that are unassigned in the version
636636+ of Unicode being used by the application, i.e., in the UNASSIGNED
637637+ category of the Tables document.
638638+639639+ This requirement means that the application must use a list of
640640+ unassigned characters that is matched to the version of Unicode
641641+ that is being used for the other requirements in this section. It
642642+ is not required that the application know which version of Unicode
643643+ is being used; that information might be part of the operating
644644+ environment in which the application is running.
645645+646646+ In addition, the application SHOULD apply the following test.
647647+648648+ o Verification that the string is compliant with the requirements
649649+ for right-to-left characters specified in the Bidi document
650650+ [RFC5893].
651651+652652+ This test may be omitted in special circumstances, such as when the
653653+ lookup application knows that the conditions are enforced elsewhere,
654654+ because an attempt to look up and resolve such strings will almost
655655+ certainly lead to a DNS lookup failure except when wildcards are
656656+ present in the zone. However, applying the test is likely to give
657657+ much better information about the reason for a lookup failure --
658658+ information that may be usefully passed to the user when that is
659659+ feasible -- than DNS resolution failure information alone.
660660+661661+ For all other strings, the lookup application MUST rely on the
662662+ presence or absence of labels in the DNS to determine the validity of
663663+ those labels and the validity of the characters they contain. If
664664+ they are registered, they are presumed to be valid; if they are not,
665665+ their possible validity is not relevant. While a lookup application
666666+ may reasonably issue warnings about strings it believes may be
667667+ problematic, applications that decline to process a string that
668668+ conforms to the rules above (i.e., does not look it up in the DNS)
669669+ are not in conformance with this protocol.
670670+671671+672672+673673+674674+Klensin Standards Track [Page 12]
675675+676676+RFC 5891 IDNA2008 Protocol August 2010
677677+678678+679679+5.5. Punycode Conversion
680680+681681+ The string that has now been validated for lookup is converted to ACE
682682+ form by applying the Punycode algorithm to the string and then adding
683683+ the ACE prefix ("xn--").
684684+685685+5.6. DNS Name Resolution
686686+687687+ The A-label resulting from the conversion in Section 5.5 or supplied
688688+ directly (see Section 5.3) is combined with other labels as needed to
689689+ form a fully-qualified domain name that is then looked up in the DNS,
690690+ using normal DNS resolver procedures. The lookup can obviously
691691+ either succeed (returning information) or fail.
692692+693693+6. Security Considerations
694694+695695+ Security Considerations for this version of IDNA are described in the
696696+ Definitions document [RFC5890], except for the special issues
697697+ associated with right-to-left scripts and characters. The latter are
698698+ discussed in the Bidi document [RFC5893].
699699+700700+ In order to avoid intentional or accidental attacks from labels that
701701+ might be confused with others, special problems in rendering, and so
702702+ on, the IDNA model requires that registries exercise care and
703703+ thoughtfulness about what labels they choose to permit. That issue
704704+ is discussed in Section 4.3 of this document which, in turn, points
705705+ to a somewhat more extensive discussion in the Rationale document
706706+ [RFC5894].
707707+708708+7. IANA Considerations
709709+710710+ IANA actions for this version of IDNA are specified in the Tables
711711+ document [RFC5892] and discussed informally in the Rationale document
712712+ [RFC5894]. The components of IDNA described in this document do not
713713+ require any IANA actions.
714714+715715+8. Contributors
716716+717717+ While the listed editor held the pen, the original versions of this
718718+ document represent the joint work and conclusions of an ad hoc design
719719+ team consisting of the editor and, in alphabetic order, Harald
720720+ Alvestrand, Tina Dam, Patrik Faltstrom, and Cary Karp. This document
721721+ draws significantly on the original version of IDNA [RFC3490] both
722722+ conceptually and for specific text. This second-generation version
723723+ would not have been possible without the work that went into that
724724+ first version and especially the contributions of its authors Patrik
725725+ Faltstrom, Paul Hoffman, and Adam Costello. While Faltstrom was
726726+727727+728728+729729+730730+Klensin Standards Track [Page 13]
731731+732732+RFC 5891 IDNA2008 Protocol August 2010
733733+734734+735735+ actively involved in the creation of this version, Hoffman and
736736+ Costello were not and should not be held responsible for any errors
737737+ or omissions.
738738+739739+9. Acknowledgments
740740+741741+ This revision to IDNA would have been impossible without the
742742+ accumulated experience since RFC 3490 was published and resulting
743743+ comments and complaints of many people in the IETF, ICANN, and other
744744+ communities (too many people to list here). Nor would it have been
745745+ possible without RFC 3490 itself and the efforts of the Working Group
746746+ that defined it. Those people whose contributions are acknowledged
747747+ in RFC 3490, RFC 4690 [RFC4690], and the Rationale document [RFC5894]
748748+ were particularly important.
749749+750750+ Specific textual changes were incorporated into this document after
751751+ suggestions from the other contributors, Stephane Bortzmeyer, Vint
752752+ Cerf, Lisa Dusseault, Paul Hoffman, Kent Karlsson, James Mitchell,
753753+ Erik van der Poel, Marcos Sanz, Andrew Sullivan, Wil Tan, Ken
754754+ Whistler, Chris Wright, and other WG participants and reviewers
755755+ including Martin Duerst, James Mitchell, Subramanian Moonesamy, Peter
756756+ Saint-Andre, Margaret Wasserman, and Dan Winship who caught specific
757757+ errors and recommended corrections. Special thanks are due to Paul
758758+ Hoffman for permission to extract material to form the basis for
759759+ Appendix A from a draft document that he prepared.
760760+761761+10. References
762762+763763+10.1. Normative References
764764+765765+ [RFC1034] Mockapetris, P., "Domain names - concepts and
766766+ facilities", STD 13, RFC 1034, November 1987.
767767+768768+ [RFC1035] Mockapetris, P., "Domain names - implementation and
769769+ specification", STD 13, RFC 1035, November 1987.
770770+771771+ [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
772772+ Requirement Levels", BCP 14, RFC 2119, March 1997.
773773+774774+ [RFC3492] Costello, A., "Punycode: A Bootstring encoding of
775775+ Unicode for Internationalized Domain Names in
776776+ Applications (IDNA)", RFC 3492, March 2003.
777777+778778+ [RFC5890] Klensin, J., "Internationalized Domain Names for
779779+ Applications (IDNA): Definitions and Document
780780+ Framework", RFC 5890, August 2010.
781781+782782+783783+784784+785785+786786+Klensin Standards Track [Page 14]
787787+788788+RFC 5891 IDNA2008 Protocol August 2010
789789+790790+791791+ [RFC5892] Faltstrom, P., Ed., "The Unicode Code Points and
792792+ Internationalized Domain Names for Applications (IDNA)",
793793+ RFC 5892, August 2010.
794794+795795+ [RFC5893] Alvestrand, H., Ed. and C. Karp, "Right-to-Left Scripts
796796+ for Internationalized Domain Names for Applications
797797+ (IDNA)", RFC 5893, August 2010.
798798+799799+ [Unicode-UAX15]
800800+ The Unicode Consortium, "Unicode Standard Annex #15:
801801+ Unicode Normalization Forms", September 2009,
802802+ <http://www.unicode.org/reports/tr15/>.
803803+804804+10.2. Informative References
805805+806806+ [ASCII] American National Standards Institute (formerly United
807807+ States of America Standards Institute), "USA Code for
808808+ Information Interchange", ANSI X3.4-1968, 1968. ANSI
809809+ X3.4-1968 has been replaced by newer versions with
810810+ slight modifications, but the 1968 version remains
811811+ definitive for the Internet.
812812+813813+ [IDNA2008-Mapping]
814814+ Resnick, P. and P. Hoffman, "Mapping Characters in
815815+ Internationalized Domain Names for Applications (IDNA)",
816816+ Work in Progress, April 2010.
817817+818818+ [RFC2671] Vixie, P., "Extension Mechanisms for DNS (EDNS0)",
819819+ RFC 2671, August 1999.
820820+821821+ [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello,
822822+ "Internationalizing Domain Names in Applications
823823+ (IDNA)", RFC 3490, March 2003.
824824+825825+ [RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep
826826+ Profile for Internationalized Domain Names (IDN)",
827827+ RFC 3491, March 2003.
828828+829829+ [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
830830+ Resource Identifier (URI): Generic Syntax", STD 66,
831831+ RFC 3986, January 2005.
832832+833833+ [RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource
834834+ Identifiers (IRIs)", RFC 3987, January 2005.
835835+836836+ [RFC4690] Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review
837837+ and Recommendations for Internationalized Domain Names
838838+ (IDNs)", RFC 4690, September 2006.
839839+840840+841841+842842+Klensin Standards Track [Page 15]
843843+844844+RFC 5891 IDNA2008 Protocol August 2010
845845+846846+847847+ [RFC4952] Klensin, J. and Y. Ko, "Overview and Framework for
848848+ Internationalized Email", RFC 4952, July 2007.
849849+850850+ [RFC5894] Klensin, J., "Internationalized Domain Names for
851851+ Applications (IDNA): Background, Explanation, and
852852+ Rationale", RFC 5894, August 2010.
853853+854854+ [Unicode] The Unicode Consortium, "The Unicode Standard, Version
855855+ 5.0", 2007. Boston, MA, USA: Addison-Wesley. ISBN
856856+ 0-321-48091-0. This printed reference has now been
857857+ updated online to reflect additional code points. For
858858+ code points, the reference at the time this document was
859859+ published is to Unicode 5.2.
860860+861861+862862+863863+864864+865865+866866+867867+868868+869869+870870+871871+872872+873873+874874+875875+876876+877877+878878+879879+880880+881881+882882+883883+884884+885885+886886+887887+888888+889889+890890+891891+892892+893893+894894+895895+896896+897897+898898+Klensin Standards Track [Page 16]
899899+900900+RFC 5891 IDNA2008 Protocol August 2010
901901+902902+903903+Appendix A. Summary of Major Changes from IDNA2003
904904+905905+ 1. Update base character set from Unicode 3.2 to Unicode version
906906+ agnostic.
907907+908908+ 2. Separate the definitions for the "registration" and "lookup"
909909+ activities.
910910+911911+ 3. Disallow symbol and punctuation characters except where special
912912+ exceptions are necessary.
913913+914914+ 4. Remove the mapping and normalization steps from the protocol and
915915+ have them, instead, done by the applications themselves,
916916+ possibly in a local fashion, before invoking the protocol.
917917+918918+ 5. Change the way that the protocol specifies which characters are
919919+ allowed in labels from "humans decide what the table of code
920920+ points contains" to "decision about code points are based on
921921+ Unicode properties plus a small exclusion list created by
922922+ humans".
923923+924924+ 6. Introduce the new concept of characters that can be used only in
925925+ specific contexts.
926926+927927+ 7. Allow typical words and names in languages such as Dhivehi and
928928+ Yiddish to be expressed.
929929+930930+ 8. Make bidirectional domain names (delimited strings of labels,
931931+ not just labels standing on their own) display in a less
932932+ surprising fashion, whether they appear in obvious domain name
933933+ contexts or as part of running text in paragraphs.
934934+935935+ 9. Remove the dot separator from the mandatory part of the
936936+ protocol.
937937+938938+ 10. Make some currently valid labels that are not actually IDNA
939939+ labels invalid.
940940+941941+Author's Address
942942+943943+ John C Klensin
944944+ 1770 Massachusetts Ave, Ste 322
945945+ Cambridge, MA 02140
946946+ USA
947947+948948+ Phone: +1 617 245 1457
949949+ EMail: john+ietf@jck.com
950950+951951+952952+953953+954954+Klensin Standards Track [Page 17]
955955+