OCaml implementation of the Mozilla Public Suffix service

rfcs

+10421 -5
+12
README.md
··· 80 80 opam exec -- dune build @doc 81 81 ``` 82 82 83 + ## Technical Standards 84 + 85 + This library is built on the following Internet standards: 86 + 87 + - **[RFC 1034](https://datatracker.ietf.org/doc/html/rfc1034)** - Domain Names: Concepts and Facilities 88 + - **[RFC 1035](https://datatracker.ietf.org/doc/html/rfc1035)** - Domain Names: Implementation and Specification 89 + - **[RFC 3492](https://datatracker.ietf.org/doc/html/rfc3492)** - Punycode: A Bootstring encoding of Unicode for IDNA 90 + - **[RFC 5890](https://datatracker.ietf.org/doc/html/rfc5890)** - IDNA: Definitions and Document Framework 91 + - **[RFC 5891](https://datatracker.ietf.org/doc/html/rfc5891)** - IDNA: Protocol 92 + 93 + RFC specifications are available in the `spec/` directory for reference. 94 + 83 95 ## License 84 96 85 97 ISC
+46 -5
lib/publicsuffix.mli
··· 20 20 For example, for the domain [www.example.com], the public suffix is [.com] 21 21 and the registrable domain is [example.com]. 22 22 23 + Domain names follow the specifications in {{:https://datatracker.ietf.org/doc/html/rfc1034}RFC 1034} 24 + and {{:https://datatracker.ietf.org/doc/html/rfc1035}RFC 1035}, which define 25 + the Domain Name System concepts and implementation. 26 + 23 27 {1 Sections} 24 28 25 29 The PSL is divided into two sections: ··· 67 71 {1 Internationalized Domain Names} 68 72 69 73 The library handles internationalized domain names (IDN) by converting them 70 - to Punycode (ASCII-compatible encoding) before lookup. Both Unicode and 71 - Punycode input are accepted: 74 + to Punycode (ASCII-compatible encoding) before lookup, following the IDNA2008 75 + protocol defined in {{:https://datatracker.ietf.org/doc/html/rfc5890}RFC 5890} 76 + (IDNA Definitions) and {{:https://datatracker.ietf.org/doc/html/rfc5891}RFC 5891} 77 + (IDNA Protocol). 78 + 79 + Punycode encoding, specified in {{:https://datatracker.ietf.org/doc/html/rfc3492}RFC 3492}, 80 + uniquely and reversibly transforms Unicode strings into ASCII-compatible 81 + strings using the "xn--" prefix (ACE prefix). Both Unicode and Punycode 82 + input are accepted: 72 83 73 84 {[ 74 85 Publicsuffix.registrable_domain psl "www.食狮.com.cn" ··· 90 101 Publicsuffix.public_suffix psl "example.com." 91 102 (* Returns: Ok "com." *) 92 103 ]} 104 + 105 + {1 References} 106 + 107 + This library implementation is based on the following specifications: 108 + 109 + {ul 110 + {- {{:https://publicsuffix.org/list/} Public Suffix List Specification} - The algorithm and list format} 111 + {- {{:https://datatracker.ietf.org/doc/html/rfc1034}RFC 1034} - Domain Names: Concepts and Facilities} 112 + {- {{:https://datatracker.ietf.org/doc/html/rfc1035}RFC 1035} - Domain Names: Implementation and Specification} 113 + {- {{:https://datatracker.ietf.org/doc/html/rfc3492}RFC 3492} - Punycode: A Bootstring encoding of Unicode for IDNA} 114 + {- {{:https://datatracker.ietf.org/doc/html/rfc5890}RFC 5890} - Internationalized Domain Names for Applications (IDNA): Definitions} 115 + {- {{:https://datatracker.ietf.org/doc/html/rfc5891}RFC 5891} - Internationalized Domain Names in Applications (IDNA): Protocol}} 93 116 *) 94 117 95 118 (** {1 Types} *) ··· 109 132 | Empty_domain 110 133 (** The input domain was empty *) 111 134 | Invalid_domain of string 112 - (** The domain could not be parsed as a valid domain name *) 135 + (** The domain could not be parsed as a valid domain name. 136 + Domain names must conform to the syntax specified in 137 + {{:https://datatracker.ietf.org/doc/html/rfc1035}RFC 1035}. *) 113 138 | Leading_dot 114 - (** The domain has a leading dot (e.g., [.example.com]) *) 139 + (** The domain has a leading dot (e.g., [.example.com]). 140 + Per {{:https://datatracker.ietf.org/doc/html/rfc1035}RFC 1035}, 141 + domain names should not have leading dots. *) 115 142 | Punycode_error of string 116 - (** Failed to convert internationalized domain to Punycode *) 143 + (** Failed to convert internationalized domain to Punycode encoding. 144 + See {{:https://datatracker.ietf.org/doc/html/rfc3492}RFC 3492} 145 + for Punycode encoding requirements and 146 + {{:https://datatracker.ietf.org/doc/html/rfc5891}RFC 5891} 147 + for IDNA protocol requirements. *) 117 148 | No_public_suffix 118 149 (** The domain has no public suffix (should not happen with valid domains) *) 119 150 | Domain_is_public_suffix ··· 140 171 - Exception rules ([!]) take priority over all other rules 141 172 - If no rules match, the implicit [*] rule applies (returns the TLD) 142 173 174 + Domain names are processed according to 175 + {{:https://datatracker.ietf.org/doc/html/rfc1035}RFC 1035} syntax. 176 + Internationalized domain names (IDN) are automatically converted to 177 + Punycode per {{:https://datatracker.ietf.org/doc/html/rfc3492}RFC 3492} 178 + before matching. 179 + 143 180 @param t The PSL instance 144 181 @param domain The domain name to query (Unicode or Punycode) 145 182 @return [Ok suffix] with the public suffix, or [Error e] on failure ··· 166 203 167 204 The registrable domain is the public suffix plus one additional label. 168 205 This is the highest-level domain that can be registered by a user. 206 + 207 + Domain labels follow the naming restrictions specified in 208 + {{:https://datatracker.ietf.org/doc/html/rfc1035}RFC 1035}. Internationalized 209 + domain names are handled per {{:https://datatracker.ietf.org/doc/html/rfc5891}RFC 5891}. 169 210 170 211 @param t The PSL instance 171 212 @param domain The domain name to query
+3077
spec/rfc1034.txt
··· 1 + Network Working Group P. Mockapetris 2 + Request for Comments: 1034 ISI 3 + Obsoletes: RFCs 882, 883, 973 November 1987 4 + 5 + 6 + DOMAIN NAMES - CONCEPTS AND FACILITIES 7 + 8 + 9 + 10 + 1. STATUS OF THIS MEMO 11 + 12 + This RFC is an introduction to the Domain Name System (DNS), and omits 13 + many details which can be found in a companion RFC, "Domain Names - 14 + Implementation and Specification" [RFC-1035]. That RFC assumes that the 15 + reader is familiar with the concepts discussed in this memo. 16 + 17 + A subset of DNS functions and data types constitute an official 18 + protocol. The official protocol includes standard queries and their 19 + responses and most of the Internet class data formats (e.g., host 20 + addresses). 21 + 22 + However, the domain system is intentionally extensible. Researchers are 23 + continuously proposing, implementing and experimenting with new data 24 + types, query types, classes, functions, etc. Thus while the components 25 + of the official protocol are expected to stay essentially unchanged and 26 + operate as a production service, experimental behavior should always be 27 + expected in extensions beyond the official protocol. Experimental or 28 + obsolete features are clearly marked in these RFCs, and such information 29 + should be used with caution. 30 + 31 + The reader is especially cautioned not to depend on the values which 32 + appear in examples to be current or complete, since their purpose is 33 + primarily pedagogical. Distribution of this memo is unlimited. 34 + 35 + 2. INTRODUCTION 36 + 37 + This RFC introduces domain style names, their use for Internet mail and 38 + host address support, and the protocols and servers used to implement 39 + domain name facilities. 40 + 41 + 2.1. The history of domain names 42 + 43 + The impetus for the development of the domain system was growth in the 44 + Internet: 45 + 46 + - Host name to address mappings were maintained by the Network 47 + Information Center (NIC) in a single file (HOSTS.TXT) which 48 + was FTPed by all hosts [RFC-952, RFC-953]. The total network 49 + 50 + 51 + 52 + Mockapetris [Page 1] 53 + 54 + RFC 1034 Domain Concepts and Facilities November 1987 55 + 56 + 57 + bandwidth consumed in distributing a new version by this 58 + scheme is proportional to the square of the number of hosts in 59 + the network, and even when multiple levels of FTP are used, 60 + the outgoing FTP load on the NIC host is considerable. 61 + Explosive growth in the number of hosts didn't bode well for 62 + the future. 63 + 64 + - The network population was also changing in character. The 65 + timeshared hosts that made up the original ARPANET were being 66 + replaced with local networks of workstations. Local 67 + organizations were administering their own names and 68 + addresses, but had to wait for the NIC to change HOSTS.TXT to 69 + make changes visible to the Internet at large. Organizations 70 + also wanted some local structure on the name space. 71 + 72 + - The applications on the Internet were getting more 73 + sophisticated and creating a need for general purpose name 74 + service. 75 + 76 + 77 + The result was several ideas about name spaces and their management 78 + [IEN-116, RFC-799, RFC-819, RFC-830]. The proposals varied, but a 79 + common thread was the idea of a hierarchical name space, with the 80 + hierarchy roughly corresponding to organizational structure, and names 81 + using "." as the character to mark the boundary between hierarchy 82 + levels. A design using a distributed database and generalized resources 83 + was described in [RFC-882, RFC-883]. Based on experience with several 84 + implementations, the system evolved into the scheme described in this 85 + memo. 86 + 87 + The terms "domain" or "domain name" are used in many contexts beyond the 88 + DNS described here. Very often, the term domain name is used to refer 89 + to a name with structure indicated by dots, but no relation to the DNS. 90 + This is particularly true in mail addressing [Quarterman 86]. 91 + 92 + 2.2. DNS design goals 93 + 94 + The design goals of the DNS influence its structure. They are: 95 + 96 + - The primary goal is a consistent name space which will be used 97 + for referring to resources. In order to avoid the problems 98 + caused by ad hoc encodings, names should not be required to 99 + contain network identifiers, addresses, routes, or similar 100 + information as part of the name. 101 + 102 + - The sheer size of the database and frequency of updates 103 + suggest that it must be maintained in a distributed manner, 104 + with local caching to improve performance. Approaches that 105 + 106 + 107 + 108 + Mockapetris [Page 2] 109 + 110 + RFC 1034 Domain Concepts and Facilities November 1987 111 + 112 + 113 + attempt to collect a consistent copy of the entire database 114 + will become more and more expensive and difficult, and hence 115 + should be avoided. The same principle holds for the structure 116 + of the name space, and in particular mechanisms for creating 117 + and deleting names; these should also be distributed. 118 + 119 + - Where there tradeoffs between the cost of acquiring data, the 120 + speed of updates, and the accuracy of caches, the source of 121 + the data should control the tradeoff. 122 + 123 + - The costs of implementing such a facility dictate that it be 124 + generally useful, and not restricted to a single application. 125 + We should be able to use names to retrieve host addresses, 126 + mailbox data, and other as yet undetermined information. All 127 + data associated with a name is tagged with a type, and queries 128 + can be limited to a single type. 129 + 130 + - Because we want the name space to be useful in dissimilar 131 + networks and applications, we provide the ability to use the 132 + same name space with different protocol families or 133 + management. For example, host address formats differ between 134 + protocols, though all protocols have the notion of address. 135 + The DNS tags all data with a class as well as the type, so 136 + that we can allow parallel use of different formats for data 137 + of type address. 138 + 139 + - We want name server transactions to be independent of the 140 + communications system that carries them. Some systems may 141 + wish to use datagrams for queries and responses, and only 142 + establish virtual circuits for transactions that need the 143 + reliability (e.g., database updates, long transactions); other 144 + systems will use virtual circuits exclusively. 145 + 146 + - The system should be useful across a wide spectrum of host 147 + capabilities. Both personal computers and large timeshared 148 + hosts should be able to use the system, though perhaps in 149 + different ways. 150 + 151 + 2.3. Assumptions about usage 152 + 153 + The organization of the domain system derives from some assumptions 154 + about the needs and usage patterns of its user community and is designed 155 + to avoid many of the the complicated problems found in general purpose 156 + database systems. 157 + 158 + The assumptions are: 159 + 160 + - The size of the total database will initially be proportional 161 + 162 + 163 + 164 + Mockapetris [Page 3] 165 + 166 + RFC 1034 Domain Concepts and Facilities November 1987 167 + 168 + 169 + to the number of hosts using the system, but will eventually 170 + grow to be proportional to the number of users on those hosts 171 + as mailboxes and other information are added to the domain 172 + system. 173 + 174 + - Most of the data in the system will change very slowly (e.g., 175 + mailbox bindings, host addresses), but that the system should 176 + be able to deal with subsets that change more rapidly (on the 177 + order of seconds or minutes). 178 + 179 + - The administrative boundaries used to distribute 180 + responsibility for the database will usually correspond to 181 + organizations that have one or more hosts. Each organization 182 + that has responsibility for a particular set of domains will 183 + provide redundant name servers, either on the organization's 184 + own hosts or other hosts that the organization arranges to 185 + use. 186 + 187 + - Clients of the domain system should be able to identify 188 + trusted name servers they prefer to use before accepting 189 + referrals to name servers outside of this "trusted" set. 190 + 191 + - Access to information is more critical than instantaneous 192 + updates or guarantees of consistency. Hence the update 193 + process allows updates to percolate out through the users of 194 + the domain system rather than guaranteeing that all copies are 195 + simultaneously updated. When updates are unavailable due to 196 + network or host failure, the usual course is to believe old 197 + information while continuing efforts to update it. The 198 + general model is that copies are distributed with timeouts for 199 + refreshing. The distributor sets the timeout value and the 200 + recipient of the distribution is responsible for performing 201 + the refresh. In special situations, very short intervals can 202 + be specified, or the owner can prohibit copies. 203 + 204 + - In any system that has a distributed database, a particular 205 + name server may be presented with a query that can only be 206 + answered by some other server. The two general approaches to 207 + dealing with this problem are "recursive", in which the first 208 + server pursues the query for the client at another server, and 209 + "iterative", in which the server refers the client to another 210 + server and lets the client pursue the query. Both approaches 211 + have advantages and disadvantages, but the iterative approach 212 + is preferred for the datagram style of access. The domain 213 + system requires implementation of the iterative approach, but 214 + allows the recursive approach as an option. 215 + 216 + 217 + 218 + 219 + 220 + Mockapetris [Page 4] 221 + 222 + RFC 1034 Domain Concepts and Facilities November 1987 223 + 224 + 225 + The domain system assumes that all data originates in master files 226 + scattered through the hosts that use the domain system. These master 227 + files are updated by local system administrators. Master files are text 228 + files that are read by a local name server, and hence become available 229 + through the name servers to users of the domain system. The user 230 + programs access name servers through standard programs called resolvers. 231 + 232 + The standard format of master files allows them to be exchanged between 233 + hosts (via FTP, mail, or some other mechanism); this facility is useful 234 + when an organization wants a domain, but doesn't want to support a name 235 + server. The organization can maintain the master files locally using a 236 + text editor, transfer them to a foreign host which runs a name server, 237 + and then arrange with the system administrator of the name server to get 238 + the files loaded. 239 + 240 + Each host's name servers and resolvers are configured by a local system 241 + administrator [RFC-1033]. For a name server, this configuration data 242 + includes the identity of local master files and instructions on which 243 + non-local master files are to be loaded from foreign servers. The name 244 + server uses the master files or copies to load its zones. For 245 + resolvers, the configuration data identifies the name servers which 246 + should be the primary sources of information. 247 + 248 + The domain system defines procedures for accessing the data and for 249 + referrals to other name servers. The domain system also defines 250 + procedures for caching retrieved data and for periodic refreshing of 251 + data defined by the system administrator. 252 + 253 + The system administrators provide: 254 + 255 + - The definition of zone boundaries. 256 + 257 + - Master files of data. 258 + 259 + - Updates to master files. 260 + 261 + - Statements of the refresh policies desired. 262 + 263 + The domain system provides: 264 + 265 + - Standard formats for resource data. 266 + 267 + - Standard methods for querying the database. 268 + 269 + - Standard methods for name servers to refresh local data from 270 + foreign name servers. 271 + 272 + 273 + 274 + 275 + 276 + Mockapetris [Page 5] 277 + 278 + RFC 1034 Domain Concepts and Facilities November 1987 279 + 280 + 281 + 2.4. Elements of the DNS 282 + 283 + The DNS has three major components: 284 + 285 + - The DOMAIN NAME SPACE and RESOURCE RECORDS, which are 286 + specifications for a tree structured name space and data 287 + associated with the names. Conceptually, each node and leaf 288 + of the domain name space tree names a set of information, and 289 + query operations are attempts to extract specific types of 290 + information from a particular set. A query names the domain 291 + name of interest and describes the type of resource 292 + information that is desired. For example, the Internet 293 + uses some of its domain names to identify hosts; queries for 294 + address resources return Internet host addresses. 295 + 296 + - NAME SERVERS are server programs which hold information about 297 + the domain tree's structure and set information. A name 298 + server may cache structure or set information about any part 299 + of the domain tree, but in general a particular name server 300 + has complete information about a subset of the domain space, 301 + and pointers to other name servers that can be used to lead to 302 + information from any part of the domain tree. Name servers 303 + know the parts of the domain tree for which they have complete 304 + information; a name server is said to be an AUTHORITY for 305 + these parts of the name space. Authoritative information is 306 + organized into units called ZONEs, and these zones can be 307 + automatically distributed to the name servers which provide 308 + redundant service for the data in a zone. 309 + 310 + - RESOLVERS are programs that extract information from name 311 + servers in response to client requests. Resolvers must be 312 + able to access at least one name server and use that name 313 + server's information to answer a query directly, or pursue the 314 + query using referrals to other name servers. A resolver will 315 + typically be a system routine that is directly accessible to 316 + user programs; hence no protocol is necessary between the 317 + resolver and the user program. 318 + 319 + These three components roughly correspond to the three layers or views 320 + of the domain system: 321 + 322 + - From the user's point of view, the domain system is accessed 323 + through a simple procedure or OS call to a local resolver. 324 + The domain space consists of a single tree and the user can 325 + request information from any section of the tree. 326 + 327 + - From the resolver's point of view, the domain system is 328 + composed of an unknown number of name servers. Each name 329 + 330 + 331 + 332 + Mockapetris [Page 6] 333 + 334 + RFC 1034 Domain Concepts and Facilities November 1987 335 + 336 + 337 + server has one or more pieces of the whole domain tree's data, 338 + but the resolver views each of these databases as essentially 339 + static. 340 + 341 + - From a name server's point of view, the domain system consists 342 + of separate sets of local information called zones. The name 343 + server has local copies of some of the zones. The name server 344 + must periodically refresh its zones from master copies in 345 + local files or foreign name servers. The name server must 346 + concurrently process queries that arrive from resolvers. 347 + 348 + In the interests of performance, implementations may couple these 349 + functions. For example, a resolver on the same machine as a name server 350 + might share a database consisting of the the zones managed by the name 351 + server and the cache managed by the resolver. 352 + 353 + 3. DOMAIN NAME SPACE and RESOURCE RECORDS 354 + 355 + 3.1. Name space specifications and terminology 356 + 357 + The domain name space is a tree structure. Each node and leaf on the 358 + tree corresponds to a resource set (which may be empty). The domain 359 + system makes no distinctions between the uses of the interior nodes and 360 + leaves, and this memo uses the term "node" to refer to both. 361 + 362 + Each node has a label, which is zero to 63 octets in length. Brother 363 + nodes may not have the same label, although the same label can be used 364 + for nodes which are not brothers. One label is reserved, and that is 365 + the null (i.e., zero length) label used for the root. 366 + 367 + The domain name of a node is the list of the labels on the path from the 368 + node to the root of the tree. By convention, the labels that compose a 369 + domain name are printed or read left to right, from the most specific 370 + (lowest, farthest from the root) to the least specific (highest, closest 371 + to the root). 372 + 373 + Internally, programs that manipulate domain names should represent them 374 + as sequences of labels, where each label is a length octet followed by 375 + an octet string. Because all domain names end at the root, which has a 376 + null string for a label, these internal representations can use a length 377 + byte of zero to terminate a domain name. 378 + 379 + By convention, domain names can be stored with arbitrary case, but 380 + domain name comparisons for all present domain functions are done in a 381 + case-insensitive manner, assuming an ASCII character set, and a high 382 + order zero bit. This means that you are free to create a node with 383 + label "A" or a node with label "a", but not both as brothers; you could 384 + refer to either using "a" or "A". When you receive a domain name or 385 + 386 + 387 + 388 + Mockapetris [Page 7] 389 + 390 + RFC 1034 Domain Concepts and Facilities November 1987 391 + 392 + 393 + label, you should preserve its case. The rationale for this choice is 394 + that we may someday need to add full binary domain names for new 395 + services; existing services would not be changed. 396 + 397 + When a user needs to type a domain name, the length of each label is 398 + omitted and the labels are separated by dots ("."). Since a complete 399 + domain name ends with the root label, this leads to a printed form which 400 + ends in a dot. We use this property to distinguish between: 401 + 402 + - a character string which represents a complete domain name 403 + (often called "absolute"). For example, "poneria.ISI.EDU." 404 + 405 + - a character string that represents the starting labels of a 406 + domain name which is incomplete, and should be completed by 407 + local software using knowledge of the local domain (often 408 + called "relative"). For example, "poneria" used in the 409 + ISI.EDU domain. 410 + 411 + Relative names are either taken relative to a well known origin, or to a 412 + list of domains used as a search list. Relative names appear mostly at 413 + the user interface, where their interpretation varies from 414 + implementation to implementation, and in master files, where they are 415 + relative to a single origin domain name. The most common interpretation 416 + uses the root "." as either the single origin or as one of the members 417 + of the search list, so a multi-label relative name is often one where 418 + the trailing dot has been omitted to save typing. 419 + 420 + To simplify implementations, the total number of octets that represent a 421 + domain name (i.e., the sum of all label octets and label lengths) is 422 + limited to 255. 423 + 424 + A domain is identified by a domain name, and consists of that part of 425 + the domain name space that is at or below the domain name which 426 + specifies the domain. A domain is a subdomain of another domain if it 427 + is contained within that domain. This relationship can be tested by 428 + seeing if the subdomain's name ends with the containing domain's name. 429 + For example, A.B.C.D is a subdomain of B.C.D, C.D, D, and " ". 430 + 431 + 3.2. Administrative guidelines on use 432 + 433 + As a matter of policy, the DNS technical specifications do not mandate a 434 + particular tree structure or rules for selecting labels; its goal is to 435 + be as general as possible, so that it can be used to build arbitrary 436 + applications. In particular, the system was designed so that the name 437 + space did not have to be organized along the lines of network 438 + boundaries, name servers, etc. The rationale for this is not that the 439 + name space should have no implied semantics, but rather that the choice 440 + of implied semantics should be left open to be used for the problem at 441 + 442 + 443 + 444 + Mockapetris [Page 8] 445 + 446 + RFC 1034 Domain Concepts and Facilities November 1987 447 + 448 + 449 + hand, and that different parts of the tree can have different implied 450 + semantics. For example, the IN-ADDR.ARPA domain is organized and 451 + distributed by network and host address because its role is to translate 452 + from network or host numbers to names; NetBIOS domains [RFC-1001, RFC- 453 + 1002] are flat because that is appropriate for that application. 454 + 455 + However, there are some guidelines that apply to the "normal" parts of 456 + the name space used for hosts, mailboxes, etc., that will make the name 457 + space more uniform, provide for growth, and minimize problems as 458 + software is converted from the older host table. The political 459 + decisions about the top levels of the tree originated in RFC-920. 460 + Current policy for the top levels is discussed in [RFC-1032]. MILNET 461 + conversion issues are covered in [RFC-1031]. 462 + 463 + Lower domains which will eventually be broken into multiple zones should 464 + provide branching at the top of the domain so that the eventual 465 + decomposition can be done without renaming. Node labels which use 466 + special characters, leading digits, etc., are likely to break older 467 + software which depends on more restrictive choices. 468 + 469 + 3.3. Technical guidelines on use 470 + 471 + Before the DNS can be used to hold naming information for some kind of 472 + object, two needs must be met: 473 + 474 + - A convention for mapping between object names and domain 475 + names. This describes how information about an object is 476 + accessed. 477 + 478 + - RR types and data formats for describing the object. 479 + 480 + These rules can be quite simple or fairly complex. Very often, the 481 + designer must take into account existing formats and plan for upward 482 + compatibility for existing usage. Multiple mappings or levels of 483 + mapping may be required. 484 + 485 + For hosts, the mapping depends on the existing syntax for host names 486 + which is a subset of the usual text representation for domain names, 487 + together with RR formats for describing host addresses, etc. Because we 488 + need a reliable inverse mapping from address to host name, a special 489 + mapping for addresses into the IN-ADDR.ARPA domain is also defined. 490 + 491 + For mailboxes, the mapping is slightly more complex. The usual mail 492 + address <local-part>@<mail-domain> is mapped into a domain name by 493 + converting <local-part> into a single label (regardles of dots it 494 + contains), converting <mail-domain> into a domain name using the usual 495 + text format for domain names (dots denote label breaks), and 496 + concatenating the two to form a single domain name. Thus the mailbox 497 + 498 + 499 + 500 + Mockapetris [Page 9] 501 + 502 + RFC 1034 Domain Concepts and Facilities November 1987 503 + 504 + 505 + HOSTMASTER@SRI-NIC.ARPA is represented as a domain name by 506 + HOSTMASTER.SRI-NIC.ARPA. An appreciation for the reasons behind this 507 + design also must take into account the scheme for mail exchanges [RFC- 508 + 974]. 509 + 510 + The typical user is not concerned with defining these rules, but should 511 + understand that they usually are the result of numerous compromises 512 + between desires for upward compatibility with old usage, interactions 513 + between different object definitions, and the inevitable urge to add new 514 + features when defining the rules. The way the DNS is used to support 515 + some object is often more crucial than the restrictions inherent in the 516 + DNS. 517 + 518 + 3.4. Example name space 519 + 520 + The following figure shows a part of the current domain name space, and 521 + is used in many examples in this RFC. Note that the tree is a very 522 + small subset of the actual name space. 523 + 524 + | 525 + | 526 + +---------------------+------------------+ 527 + | | | 528 + MIL EDU ARPA 529 + | | | 530 + | | | 531 + +-----+-----+ | +------+-----+-----+ 532 + | | | | | | | 533 + BRL NOSC DARPA | IN-ADDR SRI-NIC ACC 534 + | 535 + +--------+------------------+---------------+--------+ 536 + | | | | | 537 + UCI MIT | UDEL YALE 538 + | ISI 539 + | | 540 + +---+---+ | 541 + | | | 542 + LCS ACHILLES +--+-----+-----+--------+ 543 + | | | | | | 544 + XX A C VAXA VENERA Mockapetris 545 + 546 + In this example, the root domain has three immediate subdomains: MIL, 547 + EDU, and ARPA. The LCS.MIT.EDU domain has one immediate subdomain named 548 + XX.LCS.MIT.EDU. All of the leaves are also domains. 549 + 550 + 3.5. Preferred name syntax 551 + 552 + The DNS specifications attempt to be as general as possible in the rules 553 + 554 + 555 + 556 + Mockapetris [Page 10] 557 + 558 + RFC 1034 Domain Concepts and Facilities November 1987 559 + 560 + 561 + for constructing domain names. The idea is that the name of any 562 + existing object can be expressed as a domain name with minimal changes. 563 + However, when assigning a domain name for an object, the prudent user 564 + will select a name which satisfies both the rules of the domain system 565 + and any existing rules for the object, whether these rules are published 566 + or implied by existing programs. 567 + 568 + For example, when naming a mail domain, the user should satisfy both the 569 + rules of this memo and those in RFC-822. When creating a new host name, 570 + the old rules for HOSTS.TXT should be followed. This avoids problems 571 + when old software is converted to use domain names. 572 + 573 + The following syntax will result in fewer problems with many 574 + applications that use domain names (e.g., mail, TELNET). 575 + 576 + <domain> ::= <subdomain> | " " 577 + 578 + <subdomain> ::= <label> | <subdomain> "." <label> 579 + 580 + <label> ::= <letter> [ [ <ldh-str> ] <let-dig> ] 581 + 582 + <ldh-str> ::= <let-dig-hyp> | <let-dig-hyp> <ldh-str> 583 + 584 + <let-dig-hyp> ::= <let-dig> | "-" 585 + 586 + <let-dig> ::= <letter> | <digit> 587 + 588 + <letter> ::= any one of the 52 alphabetic characters A through Z in 589 + upper case and a through z in lower case 590 + 591 + <digit> ::= any one of the ten digits 0 through 9 592 + 593 + Note that while upper and lower case letters are allowed in domain 594 + names, no significance is attached to the case. That is, two names with 595 + the same spelling but different case are to be treated as if identical. 596 + 597 + The labels must follow the rules for ARPANET host names. They must 598 + start with a letter, end with a letter or digit, and have as interior 599 + characters only letters, digits, and hyphen. There are also some 600 + restrictions on the length. Labels must be 63 characters or less. 601 + 602 + For example, the following strings identify hosts in the Internet: 603 + 604 + A.ISI.EDU XX.LCS.MIT.EDU SRI-NIC.ARPA 605 + 606 + 3.6. Resource Records 607 + 608 + A domain name identifies a node. Each node has a set of resource 609 + 610 + 611 + 612 + Mockapetris [Page 11] 613 + 614 + RFC 1034 Domain Concepts and Facilities November 1987 615 + 616 + 617 + information, which may be empty. The set of resource information 618 + associated with a particular name is composed of separate resource 619 + records (RRs). The order of RRs in a set is not significant, and need 620 + not be preserved by name servers, resolvers, or other parts of the DNS. 621 + 622 + When we talk about a specific RR, we assume it has the following: 623 + 624 + owner which is the domain name where the RR is found. 625 + 626 + type which is an encoded 16 bit value that specifies the type 627 + of the resource in this resource record. Types refer to 628 + abstract resources. 629 + 630 + This memo uses the following types: 631 + 632 + A a host address 633 + 634 + CNAME identifies the canonical name of an 635 + alias 636 + 637 + HINFO identifies the CPU and OS used by a host 638 + 639 + MX identifies a mail exchange for the 640 + domain. See [RFC-974 for details. 641 + 642 + NS 643 + the authoritative name server for the domain 644 + 645 + PTR 646 + a pointer to another part of the domain name space 647 + 648 + SOA 649 + identifies the start of a zone of authority] 650 + 651 + class which is an encoded 16 bit value which identifies a 652 + protocol family or instance of a protocol. 653 + 654 + This memo uses the following classes: 655 + 656 + IN the Internet system 657 + 658 + CH the Chaos system 659 + 660 + TTL which is the time to live of the RR. This field is a 32 661 + bit integer in units of seconds, an is primarily used by 662 + resolvers when they cache RRs. The TTL describes how 663 + long a RR can be cached before it should be discarded. 664 + 665 + 666 + 667 + 668 + Mockapetris [Page 12] 669 + 670 + RFC 1034 Domain Concepts and Facilities November 1987 671 + 672 + 673 + RDATA which is the type and sometimes class dependent data 674 + which describes the resource: 675 + 676 + A For the IN class, a 32 bit IP address 677 + 678 + For the CH class, a domain name followed 679 + by a 16 bit octal Chaos address. 680 + 681 + CNAME a domain name. 682 + 683 + MX a 16 bit preference value (lower is 684 + better) followed by a host name willing 685 + to act as a mail exchange for the owner 686 + domain. 687 + 688 + NS a host name. 689 + 690 + PTR a domain name. 691 + 692 + SOA several fields. 693 + 694 + The owner name is often implicit, rather than forming an integral part 695 + of the RR. For example, many name servers internally form tree or hash 696 + structures for the name space, and chain RRs off nodes. The remaining 697 + RR parts are the fixed header (type, class, TTL) which is consistent for 698 + all RRs, and a variable part (RDATA) that fits the needs of the resource 699 + being described. 700 + 701 + The meaning of the TTL field is a time limit on how long an RR can be 702 + kept in a cache. This limit does not apply to authoritative data in 703 + zones; it is also timed out, but by the refreshing policies for the 704 + zone. The TTL is assigned by the administrator for the zone where the 705 + data originates. While short TTLs can be used to minimize caching, and 706 + a zero TTL prohibits caching, the realities of Internet performance 707 + suggest that these times should be on the order of days for the typical 708 + host. If a change can be anticipated, the TTL can be reduced prior to 709 + the change to minimize inconsistency during the change, and then 710 + increased back to its former value following the change. 711 + 712 + The data in the RDATA section of RRs is carried as a combination of 713 + binary strings and domain names. The domain names are frequently used 714 + as "pointers" to other data in the DNS. 715 + 716 + 3.6.1. Textual expression of RRs 717 + 718 + RRs are represented in binary form in the packets of the DNS protocol, 719 + and are usually represented in highly encoded form when stored in a name 720 + server or resolver. In this memo, we adopt a style similar to that used 721 + 722 + 723 + 724 + Mockapetris [Page 13] 725 + 726 + RFC 1034 Domain Concepts and Facilities November 1987 727 + 728 + 729 + in master files in order to show the contents of RRs. In this format, 730 + most RRs are shown on a single line, although continuation lines are 731 + possible using parentheses. 732 + 733 + The start of the line gives the owner of the RR. If a line begins with 734 + a blank, then the owner is assumed to be the same as that of the 735 + previous RR. Blank lines are often included for readability. 736 + 737 + Following the owner, we list the TTL, type, and class of the RR. Class 738 + and type use the mnemonics defined above, and TTL is an integer before 739 + the type field. In order to avoid ambiguity in parsing, type and class 740 + mnemonics are disjoint, TTLs are integers, and the type mnemonic is 741 + always last. The IN class and TTL values are often omitted from examples 742 + in the interests of clarity. 743 + 744 + The resource data or RDATA section of the RR are given using knowledge 745 + of the typical representation for the data. 746 + 747 + For example, we might show the RRs carried in a message as: 748 + 749 + ISI.EDU. MX 10 VENERA.ISI.EDU. 750 + MX 10 VAXA.ISI.EDU. 751 + VENERA.ISI.EDU. A 128.9.0.32 752 + A 10.1.0.52 753 + VAXA.ISI.EDU. A 10.2.0.27 754 + A 128.9.0.33 755 + 756 + The MX RRs have an RDATA section which consists of a 16 bit number 757 + followed by a domain name. The address RRs use a standard IP address 758 + format to contain a 32 bit internet address. 759 + 760 + This example shows six RRs, with two RRs at each of three domain names. 761 + 762 + Similarly we might see: 763 + 764 + XX.LCS.MIT.EDU. IN A 10.0.0.44 765 + CH A MIT.EDU. 2420 766 + 767 + This example shows two addresses for XX.LCS.MIT.EDU, each of a different 768 + class. 769 + 770 + 3.6.2. Aliases and canonical names 771 + 772 + In existing systems, hosts and other resources often have several names 773 + that identify the same resource. For example, the names C.ISI.EDU and 774 + USC-ISIC.ARPA both identify the same host. Similarly, in the case of 775 + mailboxes, many organizations provide many names that actually go to the 776 + same mailbox; for example Mockapetris@C.ISI.EDU, Mockapetris@B.ISI.EDU, 777 + 778 + 779 + 780 + Mockapetris [Page 14] 781 + 782 + RFC 1034 Domain Concepts and Facilities November 1987 783 + 784 + 785 + and PVM@ISI.EDU all go to the same mailbox (although the mechanism 786 + behind this is somewhat complicated). 787 + 788 + Most of these systems have a notion that one of the equivalent set of 789 + names is the canonical or primary name and all others are aliases. 790 + 791 + The domain system provides such a feature using the canonical name 792 + (CNAME) RR. A CNAME RR identifies its owner name as an alias, and 793 + specifies the corresponding canonical name in the RDATA section of the 794 + RR. If a CNAME RR is present at a node, no other data should be 795 + present; this ensures that the data for a canonical name and its aliases 796 + cannot be different. This rule also insures that a cached CNAME can be 797 + used without checking with an authoritative server for other RR types. 798 + 799 + CNAME RRs cause special action in DNS software. When a name server 800 + fails to find a desired RR in the resource set associated with the 801 + domain name, it checks to see if the resource set consists of a CNAME 802 + record with a matching class. If so, the name server includes the CNAME 803 + record in the response and restarts the query at the domain name 804 + specified in the data field of the CNAME record. The one exception to 805 + this rule is that queries which match the CNAME type are not restarted. 806 + 807 + For example, suppose a name server was processing a query with for USC- 808 + ISIC.ARPA, asking for type A information, and had the following resource 809 + records: 810 + 811 + USC-ISIC.ARPA IN CNAME C.ISI.EDU 812 + 813 + C.ISI.EDU IN A 10.0.0.52 814 + 815 + Both of these RRs would be returned in the response to the type A query, 816 + while a type CNAME or * query should return just the CNAME. 817 + 818 + Domain names in RRs which point at another name should always point at 819 + the primary name and not the alias. This avoids extra indirections in 820 + accessing information. For example, the address to name RR for the 821 + above host should be: 822 + 823 + 52.0.0.10.IN-ADDR.ARPA IN PTR C.ISI.EDU 824 + 825 + rather than pointing at USC-ISIC.ARPA. Of course, by the robustness 826 + principle, domain software should not fail when presented with CNAME 827 + chains or loops; CNAME chains should be followed and CNAME loops 828 + signalled as an error. 829 + 830 + 3.7. Queries 831 + 832 + Queries are messages which may be sent to a name server to provoke a 833 + 834 + 835 + 836 + Mockapetris [Page 15] 837 + 838 + RFC 1034 Domain Concepts and Facilities November 1987 839 + 840 + 841 + response. In the Internet, queries are carried in UDP datagrams or over 842 + TCP connections. The response by the name server either answers the 843 + question posed in the query, refers the requester to another set of name 844 + servers, or signals some error condition. 845 + 846 + In general, the user does not generate queries directly, but instead 847 + makes a request to a resolver which in turn sends one or more queries to 848 + name servers and deals with the error conditions and referrals that may 849 + result. Of course, the possible questions which can be asked in a query 850 + does shape the kind of service a resolver can provide. 851 + 852 + DNS queries and responses are carried in a standard message format. The 853 + message format has a header containing a number of fixed fields which 854 + are always present, and four sections which carry query parameters and 855 + RRs. 856 + 857 + The most important field in the header is a four bit field called an 858 + opcode which separates different queries. Of the possible 16 values, 859 + one (standard query) is part of the official protocol, two (inverse 860 + query and status query) are options, one (completion) is obsolete, and 861 + the rest are unassigned. 862 + 863 + The four sections are: 864 + 865 + Question Carries the query name and other query parameters. 866 + 867 + Answer Carries RRs which directly answer the query. 868 + 869 + Authority Carries RRs which describe other authoritative servers. 870 + May optionally carry the SOA RR for the authoritative 871 + data in the answer section. 872 + 873 + Additional Carries RRs which may be helpful in using the RRs in the 874 + other sections. 875 + 876 + Note that the content, but not the format, of these sections varies with 877 + header opcode. 878 + 879 + 3.7.1. Standard queries 880 + 881 + A standard query specifies a target domain name (QNAME), query type 882 + (QTYPE), and query class (QCLASS) and asks for RRs which match. This 883 + type of query makes up such a vast majority of DNS queries that we use 884 + the term "query" to mean standard query unless otherwise specified. The 885 + QTYPE and QCLASS fields are each 16 bits long, and are a superset of 886 + defined types and classes. 887 + 888 + 889 + 890 + 891 + 892 + Mockapetris [Page 16] 893 + 894 + RFC 1034 Domain Concepts and Facilities November 1987 895 + 896 + 897 + The QTYPE field may contain: 898 + 899 + <any type> matches just that type. (e.g., A, PTR). 900 + 901 + AXFR special zone transfer QTYPE. 902 + 903 + MAILB matches all mail box related RRs (e.g. MB and MG). 904 + 905 + * matches all RR types. 906 + 907 + The QCLASS field may contain: 908 + 909 + <any class> matches just that class (e.g., IN, CH). 910 + 911 + * matches aLL RR classes. 912 + 913 + Using the query domain name, QTYPE, and QCLASS, the name server looks 914 + for matching RRs. In addition to relevant records, the name server may 915 + return RRs that point toward a name server that has the desired 916 + information or RRs that are expected to be useful in interpreting the 917 + relevant RRs. For example, a name server that doesn't have the 918 + requested information may know a name server that does; a name server 919 + that returns a domain name in a relevant RR may also return the RR that 920 + binds that domain name to an address. 921 + 922 + For example, a mailer tying to send mail to Mockapetris@ISI.EDU might 923 + ask the resolver for mail information about ISI.EDU, resulting in a 924 + query for QNAME=ISI.EDU, QTYPE=MX, QCLASS=IN. The response's answer 925 + section would be: 926 + 927 + ISI.EDU. MX 10 VENERA.ISI.EDU. 928 + MX 10 VAXA.ISI.EDU. 929 + 930 + while the additional section might be: 931 + 932 + VAXA.ISI.EDU. A 10.2.0.27 933 + A 128.9.0.33 934 + VENERA.ISI.EDU. A 10.1.0.52 935 + A 128.9.0.32 936 + 937 + Because the server assumes that if the requester wants mail exchange 938 + information, it will probably want the addresses of the mail exchanges 939 + soon afterward. 940 + 941 + Note that the QCLASS=* construct requires special interpretation 942 + regarding authority. Since a particular name server may not know all of 943 + the classes available in the domain system, it can never know if it is 944 + authoritative for all classes. Hence responses to QCLASS=* queries can 945 + 946 + 947 + 948 + Mockapetris [Page 17] 949 + 950 + RFC 1034 Domain Concepts and Facilities November 1987 951 + 952 + 953 + never be authoritative. 954 + 955 + 3.7.2. Inverse queries (Optional) 956 + 957 + Name servers may also support inverse queries that map a particular 958 + resource to a domain name or domain names that have that resource. For 959 + example, while a standard query might map a domain name to a SOA RR, the 960 + corresponding inverse query might map the SOA RR back to the domain 961 + name. 962 + 963 + Implementation of this service is optional in a name server, but all 964 + name servers must at least be able to understand an inverse query 965 + message and return a not-implemented error response. 966 + 967 + The domain system cannot guarantee the completeness or uniqueness of 968 + inverse queries because the domain system is organized by domain name 969 + rather than by host address or any other resource type. Inverse queries 970 + are primarily useful for debugging and database maintenance activities. 971 + 972 + Inverse queries may not return the proper TTL, and do not indicate cases 973 + where the identified RR is one of a set (for example, one address for a 974 + host having multiple addresses). Therefore, the RRs returned in inverse 975 + queries should never be cached. 976 + 977 + Inverse queries are NOT an acceptable method for mapping host addresses 978 + to host names; use the IN-ADDR.ARPA domain instead. 979 + 980 + A detailed discussion of inverse queries is contained in [RFC-1035]. 981 + 982 + 3.8. Status queries (Experimental) 983 + 984 + To be defined. 985 + 986 + 3.9. Completion queries (Obsolete) 987 + 988 + The optional completion services described in RFCs 882 and 883 have been 989 + deleted. Redesigned services may become available in the future, or the 990 + opcodes may be reclaimed for other use. 991 + 992 + 4. NAME SERVERS 993 + 994 + 4.1. Introduction 995 + 996 + Name servers are the repositories of information that make up the domain 997 + database. The database is divided up into sections called zones, which 998 + are distributed among the name servers. While name servers can have 999 + several optional functions and sources of data, the essential task of a 1000 + name server is to answer queries using data in its zones. By design, 1001 + 1002 + 1003 + 1004 + Mockapetris [Page 18] 1005 + 1006 + RFC 1034 Domain Concepts and Facilities November 1987 1007 + 1008 + 1009 + name servers can answer queries in a simple manner; the response can 1010 + always be generated using only local data, and either contains the 1011 + answer to the question or a referral to other name servers "closer" to 1012 + the desired information. 1013 + 1014 + A given zone will be available from several name servers to insure its 1015 + availability in spite of host or communication link failure. By 1016 + administrative fiat, we require every zone to be available on at least 1017 + two servers, and many zones have more redundancy than that. 1018 + 1019 + A given name server will typically support one or more zones, but this 1020 + gives it authoritative information about only a small section of the 1021 + domain tree. It may also have some cached non-authoritative data about 1022 + other parts of the tree. The name server marks its responses to queries 1023 + so that the requester can tell whether the response comes from 1024 + authoritative data or not. 1025 + 1026 + 4.2. How the database is divided into zones 1027 + 1028 + The domain database is partitioned in two ways: by class, and by "cuts" 1029 + made in the name space between nodes. 1030 + 1031 + The class partition is simple. The database for any class is organized, 1032 + delegated, and maintained separately from all other classes. Since, by 1033 + convention, the name spaces are the same for all classes, the separate 1034 + classes can be thought of as an array of parallel namespace trees. Note 1035 + that the data attached to nodes will be different for these different 1036 + parallel classes. The most common reasons for creating a new class are 1037 + the necessity for a new data format for existing types or a desire for a 1038 + separately managed version of the existing name space. 1039 + 1040 + Within a class, "cuts" in the name space can be made between any two 1041 + adjacent nodes. After all cuts are made, each group of connected name 1042 + space is a separate zone. The zone is said to be authoritative for all 1043 + names in the connected region. Note that the "cuts" in the name space 1044 + may be in different places for different classes, the name servers may 1045 + be different, etc. 1046 + 1047 + These rules mean that every zone has at least one node, and hence domain 1048 + name, for which it is authoritative, and all of the nodes in a 1049 + particular zone are connected. Given, the tree structure, every zone 1050 + has a highest node which is closer to the root than any other node in 1051 + the zone. The name of this node is often used to identify the zone. 1052 + 1053 + It would be possible, though not particularly useful, to partition the 1054 + name space so that each domain name was in a separate zone or so that 1055 + all nodes were in a single zone. Instead, the database is partitioned 1056 + at points where a particular organization wants to take over control of 1057 + 1058 + 1059 + 1060 + Mockapetris [Page 19] 1061 + 1062 + RFC 1034 Domain Concepts and Facilities November 1987 1063 + 1064 + 1065 + a subtree. Once an organization controls its own zone it can 1066 + unilaterally change the data in the zone, grow new tree sections 1067 + connected to the zone, delete existing nodes, or delegate new subzones 1068 + under its zone. 1069 + 1070 + If the organization has substructure, it may want to make further 1071 + internal partitions to achieve nested delegations of name space control. 1072 + In some cases, such divisions are made purely to make database 1073 + maintenance more convenient. 1074 + 1075 + 4.2.1. Technical considerations 1076 + 1077 + The data that describes a zone has four major parts: 1078 + 1079 + - Authoritative data for all nodes within the zone. 1080 + 1081 + - Data that defines the top node of the zone (can be thought of 1082 + as part of the authoritative data). 1083 + 1084 + - Data that describes delegated subzones, i.e., cuts around the 1085 + bottom of the zone. 1086 + 1087 + - Data that allows access to name servers for subzones 1088 + (sometimes called "glue" data). 1089 + 1090 + All of this data is expressed in the form of RRs, so a zone can be 1091 + completely described in terms of a set of RRs. Whole zones can be 1092 + transferred between name servers by transferring the RRs, either carried 1093 + in a series of messages or by FTPing a master file which is a textual 1094 + representation. 1095 + 1096 + The authoritative data for a zone is simply all of the RRs attached to 1097 + all of the nodes from the top node of the zone down to leaf nodes or 1098 + nodes above cuts around the bottom edge of the zone. 1099 + 1100 + Though logically part of the authoritative data, the RRs that describe 1101 + the top node of the zone are especially important to the zone's 1102 + management. These RRs are of two types: name server RRs that list, one 1103 + per RR, all of the servers for the zone, and a single SOA RR that 1104 + describes zone management parameters. 1105 + 1106 + The RRs that describe cuts around the bottom of the zone are NS RRs that 1107 + name the servers for the subzones. Since the cuts are between nodes, 1108 + these RRs are NOT part of the authoritative data of the zone, and should 1109 + be exactly the same as the corresponding RRs in the top node of the 1110 + subzone. Since name servers are always associated with zone boundaries, 1111 + NS RRs are only found at nodes which are the top node of some zone. In 1112 + the data that makes up a zone, NS RRs are found at the top node of the 1113 + 1114 + 1115 + 1116 + Mockapetris [Page 20] 1117 + 1118 + RFC 1034 Domain Concepts and Facilities November 1987 1119 + 1120 + 1121 + zone (and are authoritative) and at cuts around the bottom of the zone 1122 + (where they are not authoritative), but never in between. 1123 + 1124 + One of the goals of the zone structure is that any zone have all the 1125 + data required to set up communications with the name servers for any 1126 + subzones. That is, parent zones have all the information needed to 1127 + access servers for their children zones. The NS RRs that name the 1128 + servers for subzones are often not enough for this task since they name 1129 + the servers, but do not give their addresses. In particular, if the 1130 + name of the name server is itself in the subzone, we could be faced with 1131 + the situation where the NS RRs tell us that in order to learn a name 1132 + server's address, we should contact the server using the address we wish 1133 + to learn. To fix this problem, a zone contains "glue" RRs which are not 1134 + part of the authoritative data, and are address RRs for the servers. 1135 + These RRs are only necessary if the name server's name is "below" the 1136 + cut, and are only used as part of a referral response. 1137 + 1138 + 4.2.2. Administrative considerations 1139 + 1140 + When some organization wants to control its own domain, the first step 1141 + is to identify the proper parent zone, and get the parent zone's owners 1142 + to agree to the delegation of control. While there are no particular 1143 + technical constraints dealing with where in the tree this can be done, 1144 + there are some administrative groupings discussed in [RFC-1032] which 1145 + deal with top level organization, and middle level zones are free to 1146 + create their own rules. For example, one university might choose to use 1147 + a single zone, while another might choose to organize by subzones 1148 + dedicated to individual departments or schools. [RFC-1033] catalogs 1149 + available DNS software an discusses administration procedures. 1150 + 1151 + Once the proper name for the new subzone is selected, the new owners 1152 + should be required to demonstrate redundant name server support. Note 1153 + that there is no requirement that the servers for a zone reside in a 1154 + host which has a name in that domain. In many cases, a zone will be 1155 + more accessible to the internet at large if its servers are widely 1156 + distributed rather than being within the physical facilities controlled 1157 + by the same organization that manages the zone. For example, in the 1158 + current DNS, one of the name servers for the United Kingdom, or UK 1159 + domain, is found in the US. This allows US hosts to get UK data without 1160 + using limited transatlantic bandwidth. 1161 + 1162 + As the last installation step, the delegation NS RRs and glue RRs 1163 + necessary to make the delegation effective should be added to the parent 1164 + zone. The administrators of both zones should insure that the NS and 1165 + glue RRs which mark both sides of the cut are consistent and remain so. 1166 + 1167 + 4.3. Name server internals 1168 + 1169 + 1170 + 1171 + 1172 + Mockapetris [Page 21] 1173 + 1174 + RFC 1034 Domain Concepts and Facilities November 1987 1175 + 1176 + 1177 + 4.3.1. Queries and responses 1178 + 1179 + The principal activity of name servers is to answer standard queries. 1180 + Both the query and its response are carried in a standard message format 1181 + which is described in [RFC-1035]. The query contains a QTYPE, QCLASS, 1182 + and QNAME, which describe the types and classes of desired information 1183 + and the name of interest. 1184 + 1185 + The way that the name server answers the query depends upon whether it 1186 + is operating in recursive mode or not: 1187 + 1188 + - The simplest mode for the server is non-recursive, since it 1189 + can answer queries using only local information: the response 1190 + contains an error, the answer, or a referral to some other 1191 + server "closer" to the answer. All name servers must 1192 + implement non-recursive queries. 1193 + 1194 + - The simplest mode for the client is recursive, since in this 1195 + mode the name server acts in the role of a resolver and 1196 + returns either an error or the answer, but never referrals. 1197 + This service is optional in a name server, and the name server 1198 + may also choose to restrict the clients which can use 1199 + recursive mode. 1200 + 1201 + Recursive service is helpful in several situations: 1202 + 1203 + - a relatively simple requester that lacks the ability to use 1204 + anything other than a direct answer to the question. 1205 + 1206 + - a request that needs to cross protocol or other boundaries and 1207 + can be sent to a server which can act as intermediary. 1208 + 1209 + - a network where we want to concentrate the cache rather than 1210 + having a separate cache for each client. 1211 + 1212 + Non-recursive service is appropriate if the requester is capable of 1213 + pursuing referrals and interested in information which will aid future 1214 + requests. 1215 + 1216 + The use of recursive mode is limited to cases where both the client and 1217 + the name server agree to its use. The agreement is negotiated through 1218 + the use of two bits in query and response messages: 1219 + 1220 + - The recursion available, or RA bit, is set or cleared by a 1221 + name server in all responses. The bit is true if the name 1222 + server is willing to provide recursive service for the client, 1223 + regardless of whether the client requested recursive service. 1224 + That is, RA signals availability rather than use. 1225 + 1226 + 1227 + 1228 + Mockapetris [Page 22] 1229 + 1230 + RFC 1034 Domain Concepts and Facilities November 1987 1231 + 1232 + 1233 + - Queries contain a bit called recursion desired or RD. This 1234 + bit specifies specifies whether the requester wants recursive 1235 + service for this query. Clients may request recursive service 1236 + from any name server, though they should depend upon receiving 1237 + it only from servers which have previously sent an RA, or 1238 + servers which have agreed to provide service through private 1239 + agreement or some other means outside of the DNS protocol. 1240 + 1241 + The recursive mode occurs when a query with RD set arrives at a server 1242 + which is willing to provide recursive service; the client can verify 1243 + that recursive mode was used by checking that both RA and RD are set in 1244 + the reply. Note that the name server should never perform recursive 1245 + service unless asked via RD, since this interferes with trouble shooting 1246 + of name servers and their databases. 1247 + 1248 + If recursive service is requested and available, the recursive response 1249 + to a query will be one of the following: 1250 + 1251 + - The answer to the query, possibly preface by one or more CNAME 1252 + RRs that specify aliases encountered on the way to an answer. 1253 + 1254 + - A name error indicating that the name does not exist. This 1255 + may include CNAME RRs that indicate that the original query 1256 + name was an alias for a name which does not exist. 1257 + 1258 + - A temporary error indication. 1259 + 1260 + If recursive service is not requested or is not available, the non- 1261 + recursive response will be one of the following: 1262 + 1263 + - An authoritative name error indicating that the name does not 1264 + exist. 1265 + 1266 + - A temporary error indication. 1267 + 1268 + - Some combination of: 1269 + 1270 + RRs that answer the question, together with an indication 1271 + whether the data comes from a zone or is cached. 1272 + 1273 + A referral to name servers which have zones which are closer 1274 + ancestors to the name than the server sending the reply. 1275 + 1276 + - RRs that the name server thinks will prove useful to the 1277 + requester. 1278 + 1279 + 1280 + 1281 + 1282 + 1283 + 1284 + Mockapetris [Page 23] 1285 + 1286 + RFC 1034 Domain Concepts and Facilities November 1987 1287 + 1288 + 1289 + 4.3.2. Algorithm 1290 + 1291 + The actual algorithm used by the name server will depend on the local OS 1292 + and data structures used to store RRs. The following algorithm assumes 1293 + that the RRs are organized in several tree structures, one for each 1294 + zone, and another for the cache: 1295 + 1296 + 1. Set or clear the value of recursion available in the response 1297 + depending on whether the name server is willing to provide 1298 + recursive service. If recursive service is available and 1299 + requested via the RD bit in the query, go to step 5, 1300 + otherwise step 2. 1301 + 1302 + 2. Search the available zones for the zone which is the nearest 1303 + ancestor to QNAME. If such a zone is found, go to step 3, 1304 + otherwise step 4. 1305 + 1306 + 3. Start matching down, label by label, in the zone. The 1307 + matching process can terminate several ways: 1308 + 1309 + a. If the whole of QNAME is matched, we have found the 1310 + node. 1311 + 1312 + If the data at the node is a CNAME, and QTYPE doesn't 1313 + match CNAME, copy the CNAME RR into the answer section 1314 + of the response, change QNAME to the canonical name in 1315 + the CNAME RR, and go back to step 1. 1316 + 1317 + Otherwise, copy all RRs which match QTYPE into the 1318 + answer section and go to step 6. 1319 + 1320 + b. If a match would take us out of the authoritative data, 1321 + we have a referral. This happens when we encounter a 1322 + node with NS RRs marking cuts along the bottom of a 1323 + zone. 1324 + 1325 + Copy the NS RRs for the subzone into the authority 1326 + section of the reply. Put whatever addresses are 1327 + available into the additional section, using glue RRs 1328 + if the addresses are not available from authoritative 1329 + data or the cache. Go to step 4. 1330 + 1331 + c. If at some label, a match is impossible (i.e., the 1332 + corresponding label does not exist), look to see if a 1333 + the "*" label exists. 1334 + 1335 + If the "*" label does not exist, check whether the name 1336 + we are looking for is the original QNAME in the query 1337 + 1338 + 1339 + 1340 + Mockapetris [Page 24] 1341 + 1342 + RFC 1034 Domain Concepts and Facilities November 1987 1343 + 1344 + 1345 + or a name we have followed due to a CNAME. If the name 1346 + is original, set an authoritative name error in the 1347 + response and exit. Otherwise just exit. 1348 + 1349 + If the "*" label does exist, match RRs at that node 1350 + against QTYPE. If any match, copy them into the answer 1351 + section, but set the owner of the RR to be QNAME, and 1352 + not the node with the "*" label. Go to step 6. 1353 + 1354 + 4. Start matching down in the cache. If QNAME is found in the 1355 + cache, copy all RRs attached to it that match QTYPE into the 1356 + answer section. If there was no delegation from 1357 + authoritative data, look for the best one from the cache, and 1358 + put it in the authority section. Go to step 6. 1359 + 1360 + 5. Using the local resolver or a copy of its algorithm (see 1361 + resolver section of this memo) to answer the query. Store 1362 + the results, including any intermediate CNAMEs, in the answer 1363 + section of the response. 1364 + 1365 + 6. Using local data only, attempt to add other RRs which may be 1366 + useful to the additional section of the query. Exit. 1367 + 1368 + 4.3.3. Wildcards 1369 + 1370 + In the previous algorithm, special treatment was given to RRs with owner 1371 + names starting with the label "*". Such RRs are called wildcards. 1372 + Wildcard RRs can be thought of as instructions for synthesizing RRs. 1373 + When the appropriate conditions are met, the name server creates RRs 1374 + with an owner name equal to the query name and contents taken from the 1375 + wildcard RRs. 1376 + 1377 + This facility is most often used to create a zone which will be used to 1378 + forward mail from the Internet to some other mail system. The general 1379 + idea is that any name in that zone which is presented to server in a 1380 + query will be assumed to exist, with certain properties, unless explicit 1381 + evidence exists to the contrary. Note that the use of the term zone 1382 + here, instead of domain, is intentional; such defaults do not propagate 1383 + across zone boundaries, although a subzone may choose to achieve that 1384 + appearance by setting up similar defaults. 1385 + 1386 + The contents of the wildcard RRs follows the usual rules and formats for 1387 + RRs. The wildcards in the zone have an owner name that controls the 1388 + query names they will match. The owner name of the wildcard RRs is of 1389 + the form "*.<anydomain>", where <anydomain> is any domain name. 1390 + <anydomain> should not contain other * labels, and should be in the 1391 + authoritative data of the zone. The wildcards potentially apply to 1392 + descendants of <anydomain>, but not to <anydomain> itself. Another way 1393 + 1394 + 1395 + 1396 + Mockapetris [Page 25] 1397 + 1398 + RFC 1034 Domain Concepts and Facilities November 1987 1399 + 1400 + 1401 + to look at this is that the "*" label always matches at least one whole 1402 + label and sometimes more, but always whole labels. 1403 + 1404 + Wildcard RRs do not apply: 1405 + 1406 + - When the query is in another zone. That is, delegation cancels 1407 + the wildcard defaults. 1408 + 1409 + - When the query name or a name between the wildcard domain and 1410 + the query name is know to exist. For example, if a wildcard 1411 + RR has an owner name of "*.X", and the zone also contains RRs 1412 + attached to B.X, the wildcards would apply to queries for name 1413 + Z.X (presuming there is no explicit information for Z.X), but 1414 + not to B.X, A.B.X, or X. 1415 + 1416 + A * label appearing in a query name has no special effect, but can be 1417 + used to test for wildcards in an authoritative zone; such a query is the 1418 + only way to get a response containing RRs with an owner name with * in 1419 + it. The result of such a query should not be cached. 1420 + 1421 + Note that the contents of the wildcard RRs are not modified when used to 1422 + synthesize RRs. 1423 + 1424 + To illustrate the use of wildcard RRs, suppose a large company with a 1425 + large, non-IP/TCP, network wanted to create a mail gateway. If the 1426 + company was called X.COM, and IP/TCP capable gateway machine was called 1427 + A.X.COM, the following RRs might be entered into the COM zone: 1428 + 1429 + X.COM MX 10 A.X.COM 1430 + 1431 + *.X.COM MX 10 A.X.COM 1432 + 1433 + A.X.COM A 1.2.3.4 1434 + A.X.COM MX 10 A.X.COM 1435 + 1436 + *.A.X.COM MX 10 A.X.COM 1437 + 1438 + This would cause any MX query for any domain name ending in X.COM to 1439 + return an MX RR pointing at A.X.COM. Two wildcard RRs are required 1440 + since the effect of the wildcard at *.X.COM is inhibited in the A.X.COM 1441 + subtree by the explicit data for A.X.COM. Note also that the explicit 1442 + MX data at X.COM and A.X.COM is required, and that none of the RRs above 1443 + would match a query name of XX.COM. 1444 + 1445 + 4.3.4. Negative response caching (Optional) 1446 + 1447 + The DNS provides an optional service which allows name servers to 1448 + distribute, and resolvers to cache, negative results with TTLs. For 1449 + 1450 + 1451 + 1452 + Mockapetris [Page 26] 1453 + 1454 + RFC 1034 Domain Concepts and Facilities November 1987 1455 + 1456 + 1457 + example, a name server can distribute a TTL along with a name error 1458 + indication, and a resolver receiving such information is allowed to 1459 + assume that the name does not exist during the TTL period without 1460 + consulting authoritative data. Similarly, a resolver can make a query 1461 + with a QTYPE which matches multiple types, and cache the fact that some 1462 + of the types are not present. 1463 + 1464 + This feature can be particularly important in a system which implements 1465 + naming shorthands that use search lists beacuse a popular shorthand, 1466 + which happens to require a suffix toward the end of the search list, 1467 + will generate multiple name errors whenever it is used. 1468 + 1469 + The method is that a name server may add an SOA RR to the additional 1470 + section of a response when that response is authoritative. The SOA must 1471 + be that of the zone which was the source of the authoritative data in 1472 + the answer section, or name error if applicable. The MINIMUM field of 1473 + the SOA controls the length of time that the negative result may be 1474 + cached. 1475 + 1476 + Note that in some circumstances, the answer section may contain multiple 1477 + owner names. In this case, the SOA mechanism should only be used for 1478 + the data which matches QNAME, which is the only authoritative data in 1479 + this section. 1480 + 1481 + Name servers and resolvers should never attempt to add SOAs to the 1482 + additional section of a non-authoritative response, or attempt to infer 1483 + results which are not directly stated in an authoritative response. 1484 + There are several reasons for this, including: cached information isn't 1485 + usually enough to match up RRs and their zone names, SOA RRs may be 1486 + cached due to direct SOA queries, and name servers are not required to 1487 + output the SOAs in the authority section. 1488 + 1489 + This feature is optional, although a refined version is expected to 1490 + become part of the standard protocol in the future. Name servers are 1491 + not required to add the SOA RRs in all authoritative responses, nor are 1492 + resolvers required to cache negative results. Both are recommended. 1493 + All resolvers and recursive name servers are required to at least be 1494 + able to ignore the SOA RR when it is present in a response. 1495 + 1496 + Some experiments have also been proposed which will use this feature. 1497 + The idea is that if cached data is known to come from a particular zone, 1498 + and if an authoritative copy of the zone's SOA is obtained, and if the 1499 + zone's SERIAL has not changed since the data was cached, then the TTL of 1500 + the cached data can be reset to the zone MINIMUM value if it is smaller. 1501 + This usage is mentioned for planning purposes only, and is not 1502 + recommended as yet. 1503 + 1504 + 1505 + 1506 + 1507 + 1508 + Mockapetris [Page 27] 1509 + 1510 + RFC 1034 Domain Concepts and Facilities November 1987 1511 + 1512 + 1513 + 4.3.5. Zone maintenance and transfers 1514 + 1515 + Part of the job of a zone administrator is to maintain the zones at all 1516 + of the name servers which are authoritative for the zone. When the 1517 + inevitable changes are made, they must be distributed to all of the name 1518 + servers. While this distribution can be accomplished using FTP or some 1519 + other ad hoc procedure, the preferred method is the zone transfer part 1520 + of the DNS protocol. 1521 + 1522 + The general model of automatic zone transfer or refreshing is that one 1523 + of the name servers is the master or primary for the zone. Changes are 1524 + coordinated at the primary, typically by editing a master file for the 1525 + zone. After editing, the administrator signals the master server to 1526 + load the new zone. The other non-master or secondary servers for the 1527 + zone periodically check for changes (at a selectable interval) and 1528 + obtain new zone copies when changes have been made. 1529 + 1530 + To detect changes, secondaries just check the SERIAL field of the SOA 1531 + for the zone. In addition to whatever other changes are made, the 1532 + SERIAL field in the SOA of the zone is always advanced whenever any 1533 + change is made to the zone. The advancing can be a simple increment, or 1534 + could be based on the write date and time of the master file, etc. The 1535 + purpose is to make it possible to determine which of two copies of a 1536 + zone is more recent by comparing serial numbers. Serial number advances 1537 + and comparisons use sequence space arithmetic, so there is a theoretic 1538 + limit on how fast a zone can be updated, basically that old copies must 1539 + die out before the serial number covers half of its 32 bit range. In 1540 + practice, the only concern is that the compare operation deals properly 1541 + with comparisons around the boundary between the most positive and most 1542 + negative 32 bit numbers. 1543 + 1544 + The periodic polling of the secondary servers is controlled by 1545 + parameters in the SOA RR for the zone, which set the minimum acceptable 1546 + polling intervals. The parameters are called REFRESH, RETRY, and 1547 + EXPIRE. Whenever a new zone is loaded in a secondary, the secondary 1548 + waits REFRESH seconds before checking with the primary for a new serial. 1549 + If this check cannot be completed, new checks are started every RETRY 1550 + seconds. The check is a simple query to the primary for the SOA RR of 1551 + the zone. If the serial field in the secondary's zone copy is equal to 1552 + the serial returned by the primary, then no changes have occurred, and 1553 + the REFRESH interval wait is restarted. If the secondary finds it 1554 + impossible to perform a serial check for the EXPIRE interval, it must 1555 + assume that its copy of the zone is obsolete an discard it. 1556 + 1557 + When the poll shows that the zone has changed, then the secondary server 1558 + must request a zone transfer via an AXFR request for the zone. The AXFR 1559 + may cause an error, such as refused, but normally is answered by a 1560 + sequence of response messages. The first and last messages must contain 1561 + 1562 + 1563 + 1564 + Mockapetris [Page 28] 1565 + 1566 + RFC 1034 Domain Concepts and Facilities November 1987 1567 + 1568 + 1569 + the data for the top authoritative node of the zone. Intermediate 1570 + messages carry all of the other RRs from the zone, including both 1571 + authoritative and non-authoritative RRs. The stream of messages allows 1572 + the secondary to construct a copy of the zone. Because accuracy is 1573 + essential, TCP or some other reliable protocol must be used for AXFR 1574 + requests. 1575 + 1576 + Each secondary server is required to perform the following operations 1577 + against the master, but may also optionally perform these operations 1578 + against other secondary servers. This strategy can improve the transfer 1579 + process when the primary is unavailable due to host downtime or network 1580 + problems, or when a secondary server has better network access to an 1581 + "intermediate" secondary than to the primary. 1582 + 1583 + 5. RESOLVERS 1584 + 1585 + 5.1. Introduction 1586 + 1587 + Resolvers are programs that interface user programs to domain name 1588 + servers. In the simplest case, a resolver receives a request from a 1589 + user program (e.g., mail programs, TELNET, FTP) in the form of a 1590 + subroutine call, system call etc., and returns the desired information 1591 + in a form compatible with the local host's data formats. 1592 + 1593 + The resolver is located on the same machine as the program that requests 1594 + the resolver's services, but it may need to consult name servers on 1595 + other hosts. Because a resolver may need to consult several name 1596 + servers, or may have the requested information in a local cache, the 1597 + amount of time that a resolver will take to complete can vary quite a 1598 + bit, from milliseconds to several seconds. 1599 + 1600 + A very important goal of the resolver is to eliminate network delay and 1601 + name server load from most requests by answering them from its cache of 1602 + prior results. It follows that caches which are shared by multiple 1603 + processes, users, machines, etc., are more efficient than non-shared 1604 + caches. 1605 + 1606 + 5.2. Client-resolver interface 1607 + 1608 + 5.2.1. Typical functions 1609 + 1610 + The client interface to the resolver is influenced by the local host's 1611 + conventions, but the typical resolver-client interface has three 1612 + functions: 1613 + 1614 + 1. Host name to host address translation. 1615 + 1616 + This function is often defined to mimic a previous HOSTS.TXT 1617 + 1618 + 1619 + 1620 + Mockapetris [Page 29] 1621 + 1622 + RFC 1034 Domain Concepts and Facilities November 1987 1623 + 1624 + 1625 + based function. Given a character string, the caller wants 1626 + one or more 32 bit IP addresses. Under the DNS, it 1627 + translates into a request for type A RRs. Since the DNS does 1628 + not preserve the order of RRs, this function may choose to 1629 + sort the returned addresses or select the "best" address if 1630 + the service returns only one choice to the client. Note that 1631 + a multiple address return is recommended, but a single 1632 + address may be the only way to emulate prior HOSTS.TXT 1633 + services. 1634 + 1635 + 2. Host address to host name translation 1636 + 1637 + This function will often follow the form of previous 1638 + functions. Given a 32 bit IP address, the caller wants a 1639 + character string. The octets of the IP address are reversed, 1640 + used as name components, and suffixed with "IN-ADDR.ARPA". A 1641 + type PTR query is used to get the RR with the primary name of 1642 + the host. For example, a request for the host name 1643 + corresponding to IP address 1.2.3.4 looks for PTR RRs for 1644 + domain name "4.3.2.1.IN-ADDR.ARPA". 1645 + 1646 + 3. General lookup function 1647 + 1648 + This function retrieves arbitrary information from the DNS, 1649 + and has no counterpart in previous systems. The caller 1650 + supplies a QNAME, QTYPE, and QCLASS, and wants all of the 1651 + matching RRs. This function will often use the DNS format 1652 + for all RR data instead of the local host's, and returns all 1653 + RR content (e.g., TTL) instead of a processed form with local 1654 + quoting conventions. 1655 + 1656 + When the resolver performs the indicated function, it usually has one of 1657 + the following results to pass back to the client: 1658 + 1659 + - One or more RRs giving the requested data. 1660 + 1661 + In this case the resolver returns the answer in the 1662 + appropriate format. 1663 + 1664 + - A name error (NE). 1665 + 1666 + This happens when the referenced name does not exist. For 1667 + example, a user may have mistyped a host name. 1668 + 1669 + - A data not found error. 1670 + 1671 + This happens when the referenced name exists, but data of the 1672 + appropriate type does not. For example, a host address 1673 + 1674 + 1675 + 1676 + Mockapetris [Page 30] 1677 + 1678 + RFC 1034 Domain Concepts and Facilities November 1987 1679 + 1680 + 1681 + function applied to a mailbox name would return this error 1682 + since the name exists, but no address RR is present. 1683 + 1684 + It is important to note that the functions for translating between host 1685 + names and addresses may combine the "name error" and "data not found" 1686 + error conditions into a single type of error return, but the general 1687 + function should not. One reason for this is that applications may ask 1688 + first for one type of information about a name followed by a second 1689 + request to the same name for some other type of information; if the two 1690 + errors are combined, then useless queries may slow the application. 1691 + 1692 + 5.2.2. Aliases 1693 + 1694 + While attempting to resolve a particular request, the resolver may find 1695 + that the name in question is an alias. For example, the resolver might 1696 + find that the name given for host name to address translation is an 1697 + alias when it finds the CNAME RR. If possible, the alias condition 1698 + should be signalled back from the resolver to the client. 1699 + 1700 + In most cases a resolver simply restarts the query at the new name when 1701 + it encounters a CNAME. However, when performing the general function, 1702 + the resolver should not pursue aliases when the CNAME RR matches the 1703 + query type. This allows queries which ask whether an alias is present. 1704 + For example, if the query type is CNAME, the user is interested in the 1705 + CNAME RR itself, and not the RRs at the name it points to. 1706 + 1707 + Several special conditions can occur with aliases. Multiple levels of 1708 + aliases should be avoided due to their lack of efficiency, but should 1709 + not be signalled as an error. Alias loops and aliases which point to 1710 + non-existent names should be caught and an error condition passed back 1711 + to the client. 1712 + 1713 + 5.2.3. Temporary failures 1714 + 1715 + In a less than perfect world, all resolvers will occasionally be unable 1716 + to resolve a particular request. This condition can be caused by a 1717 + resolver which becomes separated from the rest of the network due to a 1718 + link failure or gateway problem, or less often by coincident failure or 1719 + unavailability of all servers for a particular domain. 1720 + 1721 + It is essential that this sort of condition should not be signalled as a 1722 + name or data not present error to applications. This sort of behavior 1723 + is annoying to humans, and can wreak havoc when mail systems use the 1724 + DNS. 1725 + 1726 + While in some cases it is possible to deal with such a temporary problem 1727 + by blocking the request indefinitely, this is usually not a good choice, 1728 + particularly when the client is a server process that could move on to 1729 + 1730 + 1731 + 1732 + Mockapetris [Page 31] 1733 + 1734 + RFC 1034 Domain Concepts and Facilities November 1987 1735 + 1736 + 1737 + other tasks. The recommended solution is to always have temporary 1738 + failure as one of the possible results of a resolver function, even 1739 + though this may make emulation of existing HOSTS.TXT functions more 1740 + difficult. 1741 + 1742 + 5.3. Resolver internals 1743 + 1744 + Every resolver implementation uses slightly different algorithms, and 1745 + typically spends much more logic dealing with errors of various sorts 1746 + than typical occurances. This section outlines a recommended basic 1747 + strategy for resolver operation, but leaves details to [RFC-1035]. 1748 + 1749 + 5.3.1. Stub resolvers 1750 + 1751 + One option for implementing a resolver is to move the resolution 1752 + function out of the local machine and into a name server which supports 1753 + recursive queries. This can provide an easy method of providing domain 1754 + service in a PC which lacks the resources to perform the resolver 1755 + function, or can centralize the cache for a whole local network or 1756 + organization. 1757 + 1758 + All that the remaining stub needs is a list of name server addresses 1759 + that will perform the recursive requests. This type of resolver 1760 + presumably needs the information in a configuration file, since it 1761 + probably lacks the sophistication to locate it in the domain database. 1762 + The user also needs to verify that the listed servers will perform the 1763 + recursive service; a name server is free to refuse to perform recursive 1764 + services for any or all clients. The user should consult the local 1765 + system administrator to find name servers willing to perform the 1766 + service. 1767 + 1768 + This type of service suffers from some drawbacks. Since the recursive 1769 + requests may take an arbitrary amount of time to perform, the stub may 1770 + have difficulty optimizing retransmission intervals to deal with both 1771 + lost UDP packets and dead servers; the name server can be easily 1772 + overloaded by too zealous a stub if it interprets retransmissions as new 1773 + requests. Use of TCP may be an answer, but TCP may well place burdens 1774 + on the host's capabilities which are similar to those of a real 1775 + resolver. 1776 + 1777 + 5.3.2. Resources 1778 + 1779 + In addition to its own resources, the resolver may also have shared 1780 + access to zones maintained by a local name server. This gives the 1781 + resolver the advantage of more rapid access, but the resolver must be 1782 + careful to never let cached information override zone data. In this 1783 + discussion the term "local information" is meant to mean the union of 1784 + the cache and such shared zones, with the understanding that 1785 + 1786 + 1787 + 1788 + Mockapetris [Page 32] 1789 + 1790 + RFC 1034 Domain Concepts and Facilities November 1987 1791 + 1792 + 1793 + authoritative data is always used in preference to cached data when both 1794 + are present. 1795 + 1796 + The following resolver algorithm assumes that all functions have been 1797 + converted to a general lookup function, and uses the following data 1798 + structures to represent the state of a request in progress in the 1799 + resolver: 1800 + 1801 + SNAME the domain name we are searching for. 1802 + 1803 + STYPE the QTYPE of the search request. 1804 + 1805 + SCLASS the QCLASS of the search request. 1806 + 1807 + SLIST a structure which describes the name servers and the 1808 + zone which the resolver is currently trying to query. 1809 + This structure keeps track of the resolver's current 1810 + best guess about which name servers hold the desired 1811 + information; it is updated when arriving information 1812 + changes the guess. This structure includes the 1813 + equivalent of a zone name, the known name servers for 1814 + the zone, the known addresses for the name servers, and 1815 + history information which can be used to suggest which 1816 + server is likely to be the best one to try next. The 1817 + zone name equivalent is a match count of the number of 1818 + labels from the root down which SNAME has in common with 1819 + the zone being queried; this is used as a measure of how 1820 + "close" the resolver is to SNAME. 1821 + 1822 + SBELT a "safety belt" structure of the same form as SLIST, 1823 + which is initialized from a configuration file, and 1824 + lists servers which should be used when the resolver 1825 + doesn't have any local information to guide name server 1826 + selection. The match count will be -1 to indicate that 1827 + no labels are known to match. 1828 + 1829 + CACHE A structure which stores the results from previous 1830 + responses. Since resolvers are responsible for 1831 + discarding old RRs whose TTL has expired, most 1832 + implementations convert the interval specified in 1833 + arriving RRs to some sort of absolute time when the RR 1834 + is stored in the cache. Instead of counting the TTLs 1835 + down individually, the resolver just ignores or discards 1836 + old RRs when it runs across them in the course of a 1837 + search, or discards them during periodic sweeps to 1838 + reclaim the memory consumed by old RRs. 1839 + 1840 + 1841 + 1842 + 1843 + 1844 + Mockapetris [Page 33] 1845 + 1846 + RFC 1034 Domain Concepts and Facilities November 1987 1847 + 1848 + 1849 + 5.3.3. Algorithm 1850 + 1851 + The top level algorithm has four steps: 1852 + 1853 + 1. See if the answer is in local information, and if so return 1854 + it to the client. 1855 + 1856 + 2. Find the best servers to ask. 1857 + 1858 + 3. Send them queries until one returns a response. 1859 + 1860 + 4. Analyze the response, either: 1861 + 1862 + a. if the response answers the question or contains a name 1863 + error, cache the data as well as returning it back to 1864 + the client. 1865 + 1866 + b. if the response contains a better delegation to other 1867 + servers, cache the delegation information, and go to 1868 + step 2. 1869 + 1870 + c. if the response shows a CNAME and that is not the 1871 + answer itself, cache the CNAME, change the SNAME to the 1872 + canonical name in the CNAME RR and go to step 1. 1873 + 1874 + d. if the response shows a servers failure or other 1875 + bizarre contents, delete the server from the SLIST and 1876 + go back to step 3. 1877 + 1878 + Step 1 searches the cache for the desired data. If the data is in the 1879 + cache, it is assumed to be good enough for normal use. Some resolvers 1880 + have an option at the user interface which will force the resolver to 1881 + ignore the cached data and consult with an authoritative server. This 1882 + is not recommended as the default. If the resolver has direct access to 1883 + a name server's zones, it should check to see if the desired data is 1884 + present in authoritative form, and if so, use the authoritative data in 1885 + preference to cached data. 1886 + 1887 + Step 2 looks for a name server to ask for the required data. The 1888 + general strategy is to look for locally-available name server RRs, 1889 + starting at SNAME, then the parent domain name of SNAME, the 1890 + grandparent, and so on toward the root. Thus if SNAME were 1891 + Mockapetris.ISI.EDU, this step would look for NS RRs for 1892 + Mockapetris.ISI.EDU, then ISI.EDU, then EDU, and then . (the root). 1893 + These NS RRs list the names of hosts for a zone at or above SNAME. Copy 1894 + the names into SLIST. Set up their addresses using local data. It may 1895 + be the case that the addresses are not available. The resolver has many 1896 + choices here; the best is to start parallel resolver processes looking 1897 + 1898 + 1899 + 1900 + Mockapetris [Page 34] 1901 + 1902 + RFC 1034 Domain Concepts and Facilities November 1987 1903 + 1904 + 1905 + for the addresses while continuing onward with the addresses which are 1906 + available. Obviously, the design choices and options are complicated 1907 + and a function of the local host's capabilities. The recommended 1908 + priorities for the resolver designer are: 1909 + 1910 + 1. Bound the amount of work (packets sent, parallel processes 1911 + started) so that a request can't get into an infinite loop or 1912 + start off a chain reaction of requests or queries with other 1913 + implementations EVEN IF SOMEONE HAS INCORRECTLY CONFIGURED 1914 + SOME DATA. 1915 + 1916 + 2. Get back an answer if at all possible. 1917 + 1918 + 3. Avoid unnecessary transmissions. 1919 + 1920 + 4. Get the answer as quickly as possible. 1921 + 1922 + If the search for NS RRs fails, then the resolver initializes SLIST from 1923 + the safety belt SBELT. The basic idea is that when the resolver has no 1924 + idea what servers to ask, it should use information from a configuration 1925 + file that lists several servers which are expected to be helpful. 1926 + Although there are special situations, the usual choice is two of the 1927 + root servers and two of the servers for the host's domain. The reason 1928 + for two of each is for redundancy. The root servers will provide 1929 + eventual access to all of the domain space. The two local servers will 1930 + allow the resolver to continue to resolve local names if the local 1931 + network becomes isolated from the internet due to gateway or link 1932 + failure. 1933 + 1934 + In addition to the names and addresses of the servers, the SLIST data 1935 + structure can be sorted to use the best servers first, and to insure 1936 + that all addresses of all servers are used in a round-robin manner. The 1937 + sorting can be a simple function of preferring addresses on the local 1938 + network over others, or may involve statistics from past events, such as 1939 + previous response times and batting averages. 1940 + 1941 + Step 3 sends out queries until a response is received. The strategy is 1942 + to cycle around all of the addresses for all of the servers with a 1943 + timeout between each transmission. In practice it is important to use 1944 + all addresses of a multihomed host, and too aggressive a retransmission 1945 + policy actually slows response when used by multiple resolvers 1946 + contending for the same name server and even occasionally for a single 1947 + resolver. SLIST typically contains data values to control the timeouts 1948 + and keep track of previous transmissions. 1949 + 1950 + Step 4 involves analyzing responses. The resolver should be highly 1951 + paranoid in its parsing of responses. It should also check that the 1952 + response matches the query it sent using the ID field in the response. 1953 + 1954 + 1955 + 1956 + Mockapetris [Page 35] 1957 + 1958 + RFC 1034 Domain Concepts and Facilities November 1987 1959 + 1960 + 1961 + The ideal answer is one from a server authoritative for the query which 1962 + either gives the required data or a name error. The data is passed back 1963 + to the user and entered in the cache for future use if its TTL is 1964 + greater than zero. 1965 + 1966 + If the response shows a delegation, the resolver should check to see 1967 + that the delegation is "closer" to the answer than the servers in SLIST 1968 + are. This can be done by comparing the match count in SLIST with that 1969 + computed from SNAME and the NS RRs in the delegation. If not, the reply 1970 + is bogus and should be ignored. If the delegation is valid the NS 1971 + delegation RRs and any address RRs for the servers should be cached. 1972 + The name servers are entered in the SLIST, and the search is restarted. 1973 + 1974 + If the response contains a CNAME, the search is restarted at the CNAME 1975 + unless the response has the data for the canonical name or if the CNAME 1976 + is the answer itself. 1977 + 1978 + Details and implementation hints can be found in [RFC-1035]. 1979 + 1980 + 6. A SCENARIO 1981 + 1982 + In our sample domain space, suppose we wanted separate administrative 1983 + control for the root, MIL, EDU, MIT.EDU and ISI.EDU zones. We might 1984 + allocate name servers as follows: 1985 + 1986 + 1987 + |(C.ISI.EDU,SRI-NIC.ARPA 1988 + | A.ISI.EDU) 1989 + +---------------------+------------------+ 1990 + | | | 1991 + MIL EDU ARPA 1992 + |(SRI-NIC.ARPA, |(SRI-NIC.ARPA, | 1993 + | A.ISI.EDU | C.ISI.EDU) | 1994 + +-----+-----+ | +------+-----+-----+ 1995 + | | | | | | | 1996 + BRL NOSC DARPA | IN-ADDR SRI-NIC ACC 1997 + | 1998 + +--------+------------------+---------------+--------+ 1999 + | | | | | 2000 + UCI MIT | UDEL YALE 2001 + |(XX.LCS.MIT.EDU, ISI 2002 + |ACHILLES.MIT.EDU) |(VAXA.ISI.EDU,VENERA.ISI.EDU, 2003 + +---+---+ | A.ISI.EDU) 2004 + | | | 2005 + LCS ACHILLES +--+-----+-----+--------+ 2006 + | | | | | | 2007 + XX A C VAXA VENERA Mockapetris 2008 + 2009 + 2010 + 2011 + 2012 + Mockapetris [Page 36] 2013 + 2014 + RFC 1034 Domain Concepts and Facilities November 1987 2015 + 2016 + 2017 + In this example, the authoritative name server is shown in parentheses 2018 + at the point in the domain tree at which is assumes control. 2019 + 2020 + Thus the root name servers are on C.ISI.EDU, SRI-NIC.ARPA, and 2021 + A.ISI.EDU. The MIL domain is served by SRI-NIC.ARPA and A.ISI.EDU. The 2022 + EDU domain is served by SRI-NIC.ARPA. and C.ISI.EDU. Note that servers 2023 + may have zones which are contiguous or disjoint. In this scenario, 2024 + C.ISI.EDU has contiguous zones at the root and EDU domains. A.ISI.EDU 2025 + has contiguous zones at the root and MIL domains, but also has a non- 2026 + contiguous zone at ISI.EDU. 2027 + 2028 + 6.1. C.ISI.EDU name server 2029 + 2030 + C.ISI.EDU is a name server for the root, MIL, and EDU domains of the IN 2031 + class, and would have zones for these domains. The zone data for the 2032 + root domain might be: 2033 + 2034 + . IN SOA SRI-NIC.ARPA. HOSTMASTER.SRI-NIC.ARPA. ( 2035 + 870611 ;serial 2036 + 1800 ;refresh every 30 min 2037 + 300 ;retry every 5 min 2038 + 604800 ;expire after a week 2039 + 86400) ;minimum of a day 2040 + NS A.ISI.EDU. 2041 + NS C.ISI.EDU. 2042 + NS SRI-NIC.ARPA. 2043 + 2044 + MIL. 86400 NS SRI-NIC.ARPA. 2045 + 86400 NS A.ISI.EDU. 2046 + 2047 + EDU. 86400 NS SRI-NIC.ARPA. 2048 + 86400 NS C.ISI.EDU. 2049 + 2050 + SRI-NIC.ARPA. A 26.0.0.73 2051 + A 10.0.0.51 2052 + MX 0 SRI-NIC.ARPA. 2053 + HINFO DEC-2060 TOPS20 2054 + 2055 + ACC.ARPA. A 26.6.0.65 2056 + HINFO PDP-11/70 UNIX 2057 + MX 10 ACC.ARPA. 2058 + 2059 + USC-ISIC.ARPA. CNAME C.ISI.EDU. 2060 + 2061 + 73.0.0.26.IN-ADDR.ARPA. PTR SRI-NIC.ARPA. 2062 + 65.0.6.26.IN-ADDR.ARPA. PTR ACC.ARPA. 2063 + 51.0.0.10.IN-ADDR.ARPA. PTR SRI-NIC.ARPA. 2064 + 52.0.0.10.IN-ADDR.ARPA. PTR C.ISI.EDU. 2065 + 2066 + 2067 + 2068 + Mockapetris [Page 37] 2069 + 2070 + RFC 1034 Domain Concepts and Facilities November 1987 2071 + 2072 + 2073 + 103.0.3.26.IN-ADDR.ARPA. PTR A.ISI.EDU. 2074 + 2075 + A.ISI.EDU. 86400 A 26.3.0.103 2076 + C.ISI.EDU. 86400 A 10.0.0.52 2077 + 2078 + This data is represented as it would be in a master file. Most RRs are 2079 + single line entries; the sole exception here is the SOA RR, which uses 2080 + "(" to start a multi-line RR and ")" to show the end of a multi-line RR. 2081 + Since the class of all RRs in a zone must be the same, only the first RR 2082 + in a zone need specify the class. When a name server loads a zone, it 2083 + forces the TTL of all authoritative RRs to be at least the MINIMUM field 2084 + of the SOA, here 86400 seconds, or one day. The NS RRs marking 2085 + delegation of the MIL and EDU domains, together with the glue RRs for 2086 + the servers host addresses, are not part of the authoritative data in 2087 + the zone, and hence have explicit TTLs. 2088 + 2089 + Four RRs are attached to the root node: the SOA which describes the root 2090 + zone and the 3 NS RRs which list the name servers for the root. The 2091 + data in the SOA RR describes the management of the zone. The zone data 2092 + is maintained on host SRI-NIC.ARPA, and the responsible party for the 2093 + zone is HOSTMASTER@SRI-NIC.ARPA. A key item in the SOA is the 86400 2094 + second minimum TTL, which means that all authoritative data in the zone 2095 + has at least that TTL, although higher values may be explicitly 2096 + specified. 2097 + 2098 + The NS RRs for the MIL and EDU domains mark the boundary between the 2099 + root zone and the MIL and EDU zones. Note that in this example, the 2100 + lower zones happen to be supported by name servers which also support 2101 + the root zone. 2102 + 2103 + The master file for the EDU zone might be stated relative to the origin 2104 + EDU. The zone data for the EDU domain might be: 2105 + 2106 + EDU. IN SOA SRI-NIC.ARPA. HOSTMASTER.SRI-NIC.ARPA. ( 2107 + 870729 ;serial 2108 + 1800 ;refresh every 30 minutes 2109 + 300 ;retry every 5 minutes 2110 + 604800 ;expire after a week 2111 + 86400 ;minimum of a day 2112 + ) 2113 + NS SRI-NIC.ARPA. 2114 + NS C.ISI.EDU. 2115 + 2116 + UCI 172800 NS ICS.UCI 2117 + 172800 NS ROME.UCI 2118 + ICS.UCI 172800 A 192.5.19.1 2119 + ROME.UCI 172800 A 192.5.19.31 2120 + 2121 + 2122 + 2123 + 2124 + Mockapetris [Page 38] 2125 + 2126 + RFC 1034 Domain Concepts and Facilities November 1987 2127 + 2128 + 2129 + ISI 172800 NS VAXA.ISI 2130 + 172800 NS A.ISI 2131 + 172800 NS VENERA.ISI.EDU. 2132 + VAXA.ISI 172800 A 10.2.0.27 2133 + 172800 A 128.9.0.33 2134 + VENERA.ISI.EDU. 172800 A 10.1.0.52 2135 + 172800 A 128.9.0.32 2136 + A.ISI 172800 A 26.3.0.103 2137 + 2138 + UDEL.EDU. 172800 NS LOUIE.UDEL.EDU. 2139 + 172800 NS UMN-REI-UC.ARPA. 2140 + LOUIE.UDEL.EDU. 172800 A 10.0.0.96 2141 + 172800 A 192.5.39.3 2142 + 2143 + YALE.EDU. 172800 NS YALE.ARPA. 2144 + YALE.EDU. 172800 NS YALE-BULLDOG.ARPA. 2145 + 2146 + MIT.EDU. 43200 NS XX.LCS.MIT.EDU. 2147 + 43200 NS ACHILLES.MIT.EDU. 2148 + XX.LCS.MIT.EDU. 43200 A 10.0.0.44 2149 + ACHILLES.MIT.EDU. 43200 A 18.72.0.8 2150 + 2151 + Note the use of relative names here. The owner name for the ISI.EDU. is 2152 + stated using a relative name, as are two of the name server RR contents. 2153 + Relative and absolute domain names may be freely intermixed in a master 2154 + 2155 + 6.2. Example standard queries 2156 + 2157 + The following queries and responses illustrate name server behavior. 2158 + Unless otherwise noted, the queries do not have recursion desired (RD) 2159 + in the header. Note that the answers to non-recursive queries do depend 2160 + on the server being asked, but do not depend on the identity of the 2161 + requester. 2162 + 2163 + 2164 + 2165 + 2166 + 2167 + 2168 + 2169 + 2170 + 2171 + 2172 + 2173 + 2174 + 2175 + 2176 + 2177 + 2178 + 2179 + 2180 + Mockapetris [Page 39] 2181 + 2182 + RFC 1034 Domain Concepts and Facilities November 1987 2183 + 2184 + 2185 + 6.2.1. QNAME=SRI-NIC.ARPA, QTYPE=A 2186 + 2187 + The query would look like: 2188 + 2189 + +---------------------------------------------------+ 2190 + Header | OPCODE=SQUERY | 2191 + +---------------------------------------------------+ 2192 + Question | QNAME=SRI-NIC.ARPA., QCLASS=IN, QTYPE=A | 2193 + +---------------------------------------------------+ 2194 + Answer | <empty> | 2195 + +---------------------------------------------------+ 2196 + Authority | <empty> | 2197 + +---------------------------------------------------+ 2198 + Additional | <empty> | 2199 + +---------------------------------------------------+ 2200 + 2201 + The response from C.ISI.EDU would be: 2202 + 2203 + +---------------------------------------------------+ 2204 + Header | OPCODE=SQUERY, RESPONSE, AA | 2205 + +---------------------------------------------------+ 2206 + Question | QNAME=SRI-NIC.ARPA., QCLASS=IN, QTYPE=A | 2207 + +---------------------------------------------------+ 2208 + Answer | SRI-NIC.ARPA. 86400 IN A 26.0.0.73 | 2209 + | 86400 IN A 10.0.0.51 | 2210 + +---------------------------------------------------+ 2211 + Authority | <empty> | 2212 + +---------------------------------------------------+ 2213 + Additional | <empty> | 2214 + +---------------------------------------------------+ 2215 + 2216 + The header of the response looks like the header of the query, except 2217 + that the RESPONSE bit is set, indicating that this message is a 2218 + response, not a query, and the Authoritative Answer (AA) bit is set 2219 + indicating that the address RRs in the answer section are from 2220 + authoritative data. The question section of the response matches the 2221 + question section of the query. 2222 + 2223 + 2224 + 2225 + 2226 + 2227 + 2228 + 2229 + 2230 + 2231 + 2232 + 2233 + 2234 + 2235 + 2236 + Mockapetris [Page 40] 2237 + 2238 + RFC 1034 Domain Concepts and Facilities November 1987 2239 + 2240 + 2241 + If the same query was sent to some other server which was not 2242 + authoritative for SRI-NIC.ARPA, the response might be: 2243 + 2244 + +---------------------------------------------------+ 2245 + Header | OPCODE=SQUERY,RESPONSE | 2246 + +---------------------------------------------------+ 2247 + Question | QNAME=SRI-NIC.ARPA., QCLASS=IN, QTYPE=A | 2248 + +---------------------------------------------------+ 2249 + Answer | SRI-NIC.ARPA. 1777 IN A 10.0.0.51 | 2250 + | 1777 IN A 26.0.0.73 | 2251 + +---------------------------------------------------+ 2252 + Authority | <empty> | 2253 + +---------------------------------------------------+ 2254 + Additional | <empty> | 2255 + +---------------------------------------------------+ 2256 + 2257 + This response is different from the previous one in two ways: the header 2258 + does not have AA set, and the TTLs are different. The inference is that 2259 + the data did not come from a zone, but from a cache. The difference 2260 + between the authoritative TTL and the TTL here is due to aging of the 2261 + data in a cache. The difference in ordering of the RRs in the answer 2262 + section is not significant. 2263 + 2264 + 6.2.2. QNAME=SRI-NIC.ARPA, QTYPE=* 2265 + 2266 + A query similar to the previous one, but using a QTYPE of *, would 2267 + receive the following response from C.ISI.EDU: 2268 + 2269 + +---------------------------------------------------+ 2270 + Header | OPCODE=SQUERY, RESPONSE, AA | 2271 + +---------------------------------------------------+ 2272 + Question | QNAME=SRI-NIC.ARPA., QCLASS=IN, QTYPE=* | 2273 + +---------------------------------------------------+ 2274 + Answer | SRI-NIC.ARPA. 86400 IN A 26.0.0.73 | 2275 + | A 10.0.0.51 | 2276 + | MX 0 SRI-NIC.ARPA. | 2277 + | HINFO DEC-2060 TOPS20 | 2278 + +---------------------------------------------------+ 2279 + Authority | <empty> | 2280 + +---------------------------------------------------+ 2281 + Additional | <empty> | 2282 + +---------------------------------------------------+ 2283 + 2284 + 2285 + 2286 + 2287 + 2288 + 2289 + 2290 + 2291 + 2292 + Mockapetris [Page 41] 2293 + 2294 + RFC 1034 Domain Concepts and Facilities November 1987 2295 + 2296 + 2297 + If a similar query was directed to two name servers which are not 2298 + authoritative for SRI-NIC.ARPA, the responses might be: 2299 + 2300 + +---------------------------------------------------+ 2301 + Header | OPCODE=SQUERY, RESPONSE | 2302 + +---------------------------------------------------+ 2303 + Question | QNAME=SRI-NIC.ARPA., QCLASS=IN, QTYPE=* | 2304 + +---------------------------------------------------+ 2305 + Answer | SRI-NIC.ARPA. 12345 IN A 26.0.0.73 | 2306 + | A 10.0.0.51 | 2307 + +---------------------------------------------------+ 2308 + Authority | <empty> | 2309 + +---------------------------------------------------+ 2310 + Additional | <empty> | 2311 + +---------------------------------------------------+ 2312 + 2313 + and 2314 + 2315 + +---------------------------------------------------+ 2316 + Header | OPCODE=SQUERY, RESPONSE | 2317 + +---------------------------------------------------+ 2318 + Question | QNAME=SRI-NIC.ARPA., QCLASS=IN, QTYPE=* | 2319 + +---------------------------------------------------+ 2320 + Answer | SRI-NIC.ARPA. 1290 IN HINFO DEC-2060 TOPS20 | 2321 + +---------------------------------------------------+ 2322 + Authority | <empty> | 2323 + +---------------------------------------------------+ 2324 + Additional | <empty> | 2325 + +---------------------------------------------------+ 2326 + 2327 + Neither of these answers have AA set, so neither response comes from 2328 + authoritative data. The different contents and different TTLs suggest 2329 + that the two servers cached data at different times, and that the first 2330 + server cached the response to a QTYPE=A query and the second cached the 2331 + response to a HINFO query. 2332 + 2333 + 2334 + 2335 + 2336 + 2337 + 2338 + 2339 + 2340 + 2341 + 2342 + 2343 + 2344 + 2345 + 2346 + 2347 + 2348 + Mockapetris [Page 42] 2349 + 2350 + RFC 1034 Domain Concepts and Facilities November 1987 2351 + 2352 + 2353 + 6.2.3. QNAME=SRI-NIC.ARPA, QTYPE=MX 2354 + 2355 + This type of query might be result from a mailer trying to look up 2356 + routing information for the mail destination HOSTMASTER@SRI-NIC.ARPA. 2357 + The response from C.ISI.EDU would be: 2358 + 2359 + +---------------------------------------------------+ 2360 + Header | OPCODE=SQUERY, RESPONSE, AA | 2361 + +---------------------------------------------------+ 2362 + Question | QNAME=SRI-NIC.ARPA., QCLASS=IN, QTYPE=MX | 2363 + +---------------------------------------------------+ 2364 + Answer | SRI-NIC.ARPA. 86400 IN MX 0 SRI-NIC.ARPA.| 2365 + +---------------------------------------------------+ 2366 + Authority | <empty> | 2367 + +---------------------------------------------------+ 2368 + Additional | SRI-NIC.ARPA. 86400 IN A 26.0.0.73 | 2369 + | A 10.0.0.51 | 2370 + +---------------------------------------------------+ 2371 + 2372 + This response contains the MX RR in the answer section of the response. 2373 + The additional section contains the address RRs because the name server 2374 + at C.ISI.EDU guesses that the requester will need the addresses in order 2375 + to properly use the information carried by the MX. 2376 + 2377 + 6.2.4. QNAME=SRI-NIC.ARPA, QTYPE=NS 2378 + 2379 + C.ISI.EDU would reply to this query with: 2380 + 2381 + +---------------------------------------------------+ 2382 + Header | OPCODE=SQUERY, RESPONSE, AA | 2383 + +---------------------------------------------------+ 2384 + Question | QNAME=SRI-NIC.ARPA., QCLASS=IN, QTYPE=NS | 2385 + +---------------------------------------------------+ 2386 + Answer | <empty> | 2387 + +---------------------------------------------------+ 2388 + Authority | <empty> | 2389 + +---------------------------------------------------+ 2390 + Additional | <empty> | 2391 + +---------------------------------------------------+ 2392 + 2393 + The only difference between the response and the query is the AA and 2394 + RESPONSE bits in the header. The interpretation of this response is 2395 + that the server is authoritative for the name, and the name exists, but 2396 + no RRs of type NS are present there. 2397 + 2398 + 6.2.5. QNAME=SIR-NIC.ARPA, QTYPE=A 2399 + 2400 + If a user mistyped a host name, we might see this type of query. 2401 + 2402 + 2403 + 2404 + Mockapetris [Page 43] 2405 + 2406 + RFC 1034 Domain Concepts and Facilities November 1987 2407 + 2408 + 2409 + C.ISI.EDU would answer it with: 2410 + 2411 + +---------------------------------------------------+ 2412 + Header | OPCODE=SQUERY, RESPONSE, AA, RCODE=NE | 2413 + +---------------------------------------------------+ 2414 + Question | QNAME=SIR-NIC.ARPA., QCLASS=IN, QTYPE=A | 2415 + +---------------------------------------------------+ 2416 + Answer | <empty> | 2417 + +---------------------------------------------------+ 2418 + Authority | . SOA SRI-NIC.ARPA. HOSTMASTER.SRI-NIC.ARPA. | 2419 + | 870611 1800 300 604800 86400 | 2420 + +---------------------------------------------------+ 2421 + Additional | <empty> | 2422 + +---------------------------------------------------+ 2423 + 2424 + This response states that the name does not exist. This condition is 2425 + signalled in the response code (RCODE) section of the header. 2426 + 2427 + The SOA RR in the authority section is the optional negative caching 2428 + information which allows the resolver using this response to assume that 2429 + the name will not exist for the SOA MINIMUM (86400) seconds. 2430 + 2431 + 6.2.6. QNAME=BRL.MIL, QTYPE=A 2432 + 2433 + If this query is sent to C.ISI.EDU, the reply would be: 2434 + 2435 + +---------------------------------------------------+ 2436 + Header | OPCODE=SQUERY, RESPONSE | 2437 + +---------------------------------------------------+ 2438 + Question | QNAME=BRL.MIL, QCLASS=IN, QTYPE=A | 2439 + +---------------------------------------------------+ 2440 + Answer | <empty> | 2441 + +---------------------------------------------------+ 2442 + Authority | MIL. 86400 IN NS SRI-NIC.ARPA. | 2443 + | 86400 NS A.ISI.EDU. | 2444 + +---------------------------------------------------+ 2445 + Additional | A.ISI.EDU. A 26.3.0.103 | 2446 + | SRI-NIC.ARPA. A 26.0.0.73 | 2447 + | A 10.0.0.51 | 2448 + +---------------------------------------------------+ 2449 + 2450 + This response has an empty answer section, but is not authoritative, so 2451 + it is a referral. The name server on C.ISI.EDU, realizing that it is 2452 + not authoritative for the MIL domain, has referred the requester to 2453 + servers on A.ISI.EDU and SRI-NIC.ARPA, which it knows are authoritative 2454 + for the MIL domain. 2455 + 2456 + 2457 + 2458 + 2459 + 2460 + Mockapetris [Page 44] 2461 + 2462 + RFC 1034 Domain Concepts and Facilities November 1987 2463 + 2464 + 2465 + 6.2.7. QNAME=USC-ISIC.ARPA, QTYPE=A 2466 + 2467 + The response to this query from A.ISI.EDU would be: 2468 + 2469 + +---------------------------------------------------+ 2470 + Header | OPCODE=SQUERY, RESPONSE, AA | 2471 + +---------------------------------------------------+ 2472 + Question | QNAME=USC-ISIC.ARPA., QCLASS=IN, QTYPE=A | 2473 + +---------------------------------------------------+ 2474 + Answer | USC-ISIC.ARPA. 86400 IN CNAME C.ISI.EDU. | 2475 + | C.ISI.EDU. 86400 IN A 10.0.0.52 | 2476 + +---------------------------------------------------+ 2477 + Authority | <empty> | 2478 + +---------------------------------------------------+ 2479 + Additional | <empty> | 2480 + +---------------------------------------------------+ 2481 + 2482 + Note that the AA bit in the header guarantees that the data matching 2483 + QNAME is authoritative, but does not say anything about whether the data 2484 + for C.ISI.EDU is authoritative. This complete reply is possible because 2485 + A.ISI.EDU happens to be authoritative for both the ARPA domain where 2486 + USC-ISIC.ARPA is found and the ISI.EDU domain where C.ISI.EDU data is 2487 + found. 2488 + 2489 + If the same query was sent to C.ISI.EDU, its response might be the same 2490 + as shown above if it had its own address in its cache, but might also 2491 + be: 2492 + 2493 + 2494 + 2495 + 2496 + 2497 + 2498 + 2499 + 2500 + 2501 + 2502 + 2503 + 2504 + 2505 + 2506 + 2507 + 2508 + 2509 + 2510 + 2511 + 2512 + 2513 + 2514 + 2515 + 2516 + Mockapetris [Page 45] 2517 + 2518 + RFC 1034 Domain Concepts and Facilities November 1987 2519 + 2520 + 2521 + +---------------------------------------------------+ 2522 + Header | OPCODE=SQUERY, RESPONSE, AA | 2523 + +---------------------------------------------------+ 2524 + Question | QNAME=USC-ISIC.ARPA., QCLASS=IN, QTYPE=A | 2525 + +---------------------------------------------------+ 2526 + Answer | USC-ISIC.ARPA. 86400 IN CNAME C.ISI.EDU. | 2527 + +---------------------------------------------------+ 2528 + Authority | ISI.EDU. 172800 IN NS VAXA.ISI.EDU. | 2529 + | NS A.ISI.EDU. | 2530 + | NS VENERA.ISI.EDU. | 2531 + +---------------------------------------------------+ 2532 + Additional | VAXA.ISI.EDU. 172800 A 10.2.0.27 | 2533 + | 172800 A 128.9.0.33 | 2534 + | VENERA.ISI.EDU. 172800 A 10.1.0.52 | 2535 + | 172800 A 128.9.0.32 | 2536 + | A.ISI.EDU. 172800 A 26.3.0.103 | 2537 + +---------------------------------------------------+ 2538 + 2539 + This reply contains an authoritative reply for the alias USC-ISIC.ARPA, 2540 + plus a referral to the name servers for ISI.EDU. This sort of reply 2541 + isn't very likely given that the query is for the host name of the name 2542 + server being asked, but would be common for other aliases. 2543 + 2544 + 6.2.8. QNAME=USC-ISIC.ARPA, QTYPE=CNAME 2545 + 2546 + If this query is sent to either A.ISI.EDU or C.ISI.EDU, the reply would 2547 + be: 2548 + 2549 + +---------------------------------------------------+ 2550 + Header | OPCODE=SQUERY, RESPONSE, AA | 2551 + +---------------------------------------------------+ 2552 + Question | QNAME=USC-ISIC.ARPA., QCLASS=IN, QTYPE=A | 2553 + +---------------------------------------------------+ 2554 + Answer | USC-ISIC.ARPA. 86400 IN CNAME C.ISI.EDU. | 2555 + +---------------------------------------------------+ 2556 + Authority | <empty> | 2557 + +---------------------------------------------------+ 2558 + Additional | <empty> | 2559 + +---------------------------------------------------+ 2560 + 2561 + Because QTYPE=CNAME, the CNAME RR itself answers the query, and the name 2562 + server doesn't attempt to look up anything for C.ISI.EDU. (Except 2563 + possibly for the additional section.) 2564 + 2565 + 6.3. Example resolution 2566 + 2567 + The following examples illustrate the operations a resolver must perform 2568 + for its client. We assume that the resolver is starting without a 2569 + 2570 + 2571 + 2572 + Mockapetris [Page 46] 2573 + 2574 + RFC 1034 Domain Concepts and Facilities November 1987 2575 + 2576 + 2577 + cache, as might be the case after system boot. We further assume that 2578 + the system is not one of the hosts in the data and that the host is 2579 + located somewhere on net 26, and that its safety belt (SBELT) data 2580 + structure has the following information: 2581 + 2582 + Match count = -1 2583 + SRI-NIC.ARPA. 26.0.0.73 10.0.0.51 2584 + A.ISI.EDU. 26.3.0.103 2585 + 2586 + This information specifies servers to try, their addresses, and a match 2587 + count of -1, which says that the servers aren't very close to the 2588 + target. Note that the -1 isn't supposed to be an accurate closeness 2589 + measure, just a value so that later stages of the algorithm will work. 2590 + 2591 + The following examples illustrate the use of a cache, so each example 2592 + assumes that previous requests have completed. 2593 + 2594 + 6.3.1. Resolve MX for ISI.EDU. 2595 + 2596 + Suppose the first request to the resolver comes from the local mailer, 2597 + which has mail for PVM@ISI.EDU. The mailer might then ask for type MX 2598 + RRs for the domain name ISI.EDU. 2599 + 2600 + The resolver would look in its cache for MX RRs at ISI.EDU, but the 2601 + empty cache wouldn't be helpful. The resolver would recognize that it 2602 + needed to query foreign servers and try to determine the best servers to 2603 + query. This search would look for NS RRs for the domains ISI.EDU, EDU, 2604 + and the root. These searches of the cache would also fail. As a last 2605 + resort, the resolver would use the information from the SBELT, copying 2606 + it into its SLIST structure. 2607 + 2608 + At this point the resolver would need to pick one of the three available 2609 + addresses to try. Given that the resolver is on net 26, it should 2610 + choose either 26.0.0.73 or 26.3.0.103 as its first choice. It would 2611 + then send off a query of the form: 2612 + 2613 + 2614 + 2615 + 2616 + 2617 + 2618 + 2619 + 2620 + 2621 + 2622 + 2623 + 2624 + 2625 + 2626 + 2627 + 2628 + Mockapetris [Page 47] 2629 + 2630 + RFC 1034 Domain Concepts and Facilities November 1987 2631 + 2632 + 2633 + +---------------------------------------------------+ 2634 + Header | OPCODE=SQUERY | 2635 + +---------------------------------------------------+ 2636 + Question | QNAME=ISI.EDU., QCLASS=IN, QTYPE=MX | 2637 + +---------------------------------------------------+ 2638 + Answer | <empty> | 2639 + +---------------------------------------------------+ 2640 + Authority | <empty> | 2641 + +---------------------------------------------------+ 2642 + Additional | <empty> | 2643 + +---------------------------------------------------+ 2644 + 2645 + The resolver would then wait for a response to its query or a timeout. 2646 + If the timeout occurs, it would try different servers, then different 2647 + addresses of the same servers, lastly retrying addresses already tried. 2648 + It might eventually receive a reply from SRI-NIC.ARPA: 2649 + 2650 + +---------------------------------------------------+ 2651 + Header | OPCODE=SQUERY, RESPONSE | 2652 + +---------------------------------------------------+ 2653 + Question | QNAME=ISI.EDU., QCLASS=IN, QTYPE=MX | 2654 + +---------------------------------------------------+ 2655 + Answer | <empty> | 2656 + +---------------------------------------------------+ 2657 + Authority | ISI.EDU. 172800 IN NS VAXA.ISI.EDU. | 2658 + | NS A.ISI.EDU. | 2659 + | NS VENERA.ISI.EDU.| 2660 + +---------------------------------------------------+ 2661 + Additional | VAXA.ISI.EDU. 172800 A 10.2.0.27 | 2662 + | 172800 A 128.9.0.33 | 2663 + | VENERA.ISI.EDU. 172800 A 10.1.0.52 | 2664 + | 172800 A 128.9.0.32 | 2665 + | A.ISI.EDU. 172800 A 26.3.0.103 | 2666 + +---------------------------------------------------+ 2667 + 2668 + The resolver would notice that the information in the response gave a 2669 + closer delegation to ISI.EDU than its existing SLIST (since it matches 2670 + three labels). The resolver would then cache the information in this 2671 + response and use it to set up a new SLIST: 2672 + 2673 + Match count = 3 2674 + A.ISI.EDU. 26.3.0.103 2675 + VAXA.ISI.EDU. 10.2.0.27 128.9.0.33 2676 + VENERA.ISI.EDU. 10.1.0.52 128.9.0.32 2677 + 2678 + A.ISI.EDU appears on this list as well as the previous one, but that is 2679 + purely coincidental. The resolver would again start transmitting and 2680 + waiting for responses. Eventually it would get an answer: 2681 + 2682 + 2683 + 2684 + Mockapetris [Page 48] 2685 + 2686 + RFC 1034 Domain Concepts and Facilities November 1987 2687 + 2688 + 2689 + +---------------------------------------------------+ 2690 + Header | OPCODE=SQUERY, RESPONSE, AA | 2691 + +---------------------------------------------------+ 2692 + Question | QNAME=ISI.EDU., QCLASS=IN, QTYPE=MX | 2693 + +---------------------------------------------------+ 2694 + Answer | ISI.EDU. MX 10 VENERA.ISI.EDU. | 2695 + | MX 20 VAXA.ISI.EDU. | 2696 + +---------------------------------------------------+ 2697 + Authority | <empty> | 2698 + +---------------------------------------------------+ 2699 + Additional | VAXA.ISI.EDU. 172800 A 10.2.0.27 | 2700 + | 172800 A 128.9.0.33 | 2701 + | VENERA.ISI.EDU. 172800 A 10.1.0.52 | 2702 + | 172800 A 128.9.0.32 | 2703 + +---------------------------------------------------+ 2704 + 2705 + The resolver would add this information to its cache, and return the MX 2706 + RRs to its client. 2707 + 2708 + 6.3.2. Get the host name for address 26.6.0.65 2709 + 2710 + The resolver would translate this into a request for PTR RRs for 2711 + 65.0.6.26.IN-ADDR.ARPA. This information is not in the cache, so the 2712 + resolver would look for foreign servers to ask. No servers would match, 2713 + so it would use SBELT again. (Note that the servers for the ISI.EDU 2714 + domain are in the cache, but ISI.EDU is not an ancestor of 2715 + 65.0.6.26.IN-ADDR.ARPA, so the SBELT is used.) 2716 + 2717 + Since this request is within the authoritative data of both servers in 2718 + SBELT, eventually one would return: 2719 + 2720 + 2721 + 2722 + 2723 + 2724 + 2725 + 2726 + 2727 + 2728 + 2729 + 2730 + 2731 + 2732 + 2733 + 2734 + 2735 + 2736 + 2737 + 2738 + 2739 + 2740 + Mockapetris [Page 49] 2741 + 2742 + RFC 1034 Domain Concepts and Facilities November 1987 2743 + 2744 + 2745 + +---------------------------------------------------+ 2746 + Header | OPCODE=SQUERY, RESPONSE, AA | 2747 + +---------------------------------------------------+ 2748 + Question | QNAME=65.0.6.26.IN-ADDR.ARPA.,QCLASS=IN,QTYPE=PTR | 2749 + +---------------------------------------------------+ 2750 + Answer | 65.0.6.26.IN-ADDR.ARPA. PTR ACC.ARPA. | 2751 + +---------------------------------------------------+ 2752 + Authority | <empty> | 2753 + +---------------------------------------------------+ 2754 + Additional | <empty> | 2755 + +---------------------------------------------------+ 2756 + 2757 + 6.3.3. Get the host address of poneria.ISI.EDU 2758 + 2759 + This request would translate into a type A request for poneria.ISI.EDU. 2760 + The resolver would not find any cached data for this name, but would 2761 + find the NS RRs in the cache for ISI.EDU when it looks for foreign 2762 + servers to ask. Using this data, it would construct a SLIST of the 2763 + form: 2764 + 2765 + Match count = 3 2766 + 2767 + A.ISI.EDU. 26.3.0.103 2768 + VAXA.ISI.EDU. 10.2.0.27 128.9.0.33 2769 + VENERA.ISI.EDU. 10.1.0.52 2770 + 2771 + A.ISI.EDU is listed first on the assumption that the resolver orders its 2772 + choices by preference, and A.ISI.EDU is on the same network. 2773 + 2774 + One of these servers would answer the query. 2775 + 2776 + 7. REFERENCES and BIBLIOGRAPHY 2777 + 2778 + [Dyer 87] Dyer, S., and F. Hsu, "Hesiod", Project Athena 2779 + Technical Plan - Name Service, April 1987, version 1.9. 2780 + 2781 + Describes the fundamentals of the Hesiod name service. 2782 + 2783 + [IEN-116] J. Postel, "Internet Name Server", IEN-116, 2784 + USC/Information Sciences Institute, August 1979. 2785 + 2786 + A name service obsoleted by the Domain Name System, but 2787 + still in use. 2788 + 2789 + 2790 + 2791 + 2792 + 2793 + 2794 + 2795 + 2796 + Mockapetris [Page 50] 2797 + 2798 + RFC 1034 Domain Concepts and Facilities November 1987 2799 + 2800 + 2801 + [Quarterman 86] Quarterman, J., and J. Hoskins, "Notable Computer 2802 + Networks",Communications of the ACM, October 1986, 2803 + volume 29, number 10. 2804 + 2805 + [RFC-742] K. Harrenstien, "NAME/FINGER", RFC-742, Network 2806 + Information Center, SRI International, December 1977. 2807 + 2808 + [RFC-768] J. Postel, "User Datagram Protocol", RFC-768, 2809 + USC/Information Sciences Institute, August 1980. 2810 + 2811 + [RFC-793] J. Postel, "Transmission Control Protocol", RFC-793, 2812 + USC/Information Sciences Institute, September 1981. 2813 + 2814 + [RFC-799] D. Mills, "Internet Name Domains", RFC-799, COMSAT, 2815 + September 1981. 2816 + 2817 + Suggests introduction of a hierarchy in place of a flat 2818 + name space for the Internet. 2819 + 2820 + [RFC-805] J. Postel, "Computer Mail Meeting Notes", RFC-805, 2821 + USC/Information Sciences Institute, February 1982. 2822 + 2823 + [RFC-810] E. Feinler, K. Harrenstien, Z. Su, and V. White, "DOD 2824 + Internet Host Table Specification", RFC-810, Network 2825 + Information Center, SRI International, March 1982. 2826 + 2827 + Obsolete. See RFC-952. 2828 + 2829 + [RFC-811] K. Harrenstien, V. White, and E. Feinler, "Hostnames 2830 + Server", RFC-811, Network Information Center, SRI 2831 + International, March 1982. 2832 + 2833 + Obsolete. See RFC-953. 2834 + 2835 + [RFC-812] K. Harrenstien, and V. White, "NICNAME/WHOIS", RFC-812, 2836 + Network Information Center, SRI International, March 2837 + 1982. 2838 + 2839 + [RFC-819] Z. Su, and J. Postel, "The Domain Naming Convention for 2840 + Internet User Applications", RFC-819, Network 2841 + Information Center, SRI International, August 1982. 2842 + 2843 + Early thoughts on the design of the domain system. 2844 + Current implementation is completely different. 2845 + 2846 + [RFC-821] J. Postel, "Simple Mail Transfer Protocol", RFC-821, 2847 + USC/Information Sciences Institute, August 1980. 2848 + 2849 + 2850 + 2851 + 2852 + Mockapetris [Page 51] 2853 + 2854 + RFC 1034 Domain Concepts and Facilities November 1987 2855 + 2856 + 2857 + [RFC-830] Z. Su, "A Distributed System for Internet Name Service", 2858 + RFC-830, Network Information Center, SRI International, 2859 + October 1982. 2860 + 2861 + Early thoughts on the design of the domain system. 2862 + Current implementation is completely different. 2863 + 2864 + [RFC-882] P. Mockapetris, "Domain names - Concepts and 2865 + Facilities," RFC-882, USC/Information Sciences 2866 + Institute, November 1983. 2867 + 2868 + Superceeded by this memo. 2869 + 2870 + [RFC-883] P. Mockapetris, "Domain names - Implementation and 2871 + Specification," RFC-883, USC/Information Sciences 2872 + Institute, November 1983. 2873 + 2874 + Superceeded by this memo. 2875 + 2876 + [RFC-920] J. Postel and J. Reynolds, "Domain Requirements", 2877 + RFC-920, USC/Information Sciences Institute 2878 + October 1984. 2879 + 2880 + Explains the naming scheme for top level domains. 2881 + 2882 + [RFC-952] K. Harrenstien, M. Stahl, E. Feinler, "DoD Internet Host 2883 + Table Specification", RFC-952, SRI, October 1985. 2884 + 2885 + Specifies the format of HOSTS.TXT, the host/address 2886 + table replaced by the DNS. 2887 + 2888 + [RFC-953] K. Harrenstien, M. Stahl, E. Feinler, "HOSTNAME Server", 2889 + RFC-953, SRI, October 1985. 2890 + 2891 + This RFC contains the official specification of the 2892 + hostname server protocol, which is obsoleted by the DNS. 2893 + This TCP based protocol accesses information stored in 2894 + the RFC-952 format, and is used to obtain copies of the 2895 + host table. 2896 + 2897 + [RFC-973] P. Mockapetris, "Domain System Changes and 2898 + Observations", RFC-973, USC/Information Sciences 2899 + Institute, January 1986. 2900 + 2901 + Describes changes to RFC-882 and RFC-883 and reasons for 2902 + them. Now obsolete. 2903 + 2904 + 2905 + 2906 + 2907 + 2908 + Mockapetris [Page 52] 2909 + 2910 + RFC 1034 Domain Concepts and Facilities November 1987 2911 + 2912 + 2913 + [RFC-974] C. Partridge, "Mail routing and the domain system", 2914 + RFC-974, CSNET CIC BBN Labs, January 1986. 2915 + 2916 + Describes the transition from HOSTS.TXT based mail 2917 + addressing to the more powerful MX system used with the 2918 + domain system. 2919 + 2920 + [RFC-1001] NetBIOS Working Group, "Protocol standard for a NetBIOS 2921 + service on a TCP/UDP transport: Concepts and Methods", 2922 + RFC-1001, March 1987. 2923 + 2924 + This RFC and RFC-1002 are a preliminary design for 2925 + NETBIOS on top of TCP/IP which proposes to base NetBIOS 2926 + name service on top of the DNS. 2927 + 2928 + [RFC-1002] NetBIOS Working Group, "Protocol standard for a NetBIOS 2929 + service on a TCP/UDP transport: Detailed 2930 + Specifications", RFC-1002, March 1987. 2931 + 2932 + [RFC-1010] J. Reynolds and J. Postel, "Assigned Numbers", RFC-1010, 2933 + USC/Information Sciences Institute, May 1987 2934 + 2935 + Contains socket numbers and mnemonics for host names, 2936 + operating systems, etc. 2937 + 2938 + [RFC-1031] W. Lazear, "MILNET Name Domain Transition", RFC-1031, 2939 + November 1987. 2940 + 2941 + Describes a plan for converting the MILNET to the DNS. 2942 + 2943 + [RFC-1032] M. K. Stahl, "Establishing a Domain - Guidelines for 2944 + Administrators", RFC-1032, November 1987. 2945 + 2946 + Describes the registration policies used by the NIC to 2947 + administer the top level domains and delegate subzones. 2948 + 2949 + [RFC-1033] M. K. Lottor, "Domain Administrators Operations Guide", 2950 + RFC-1033, November 1987. 2951 + 2952 + A cookbook for domain administrators. 2953 + 2954 + [Solomon 82] M. Solomon, L. Landweber, and D. Neuhengen, "The CSNET 2955 + Name Server", Computer Networks, vol 6, nr 3, July 1982. 2956 + 2957 + Describes a name service for CSNET which is independent 2958 + from the DNS and DNS use in the CSNET. 2959 + 2960 + 2961 + 2962 + 2963 + 2964 + Mockapetris [Page 53] 2965 + 2966 + RFC 1034 Domain Concepts and Facilities November 1987 2967 + 2968 + 2969 + Index 2970 + 2971 + A 12 2972 + Absolute names 8 2973 + Aliases 14, 31 2974 + Authority 6 2975 + AXFR 17 2976 + 2977 + Case of characters 7 2978 + CH 12 2979 + CNAME 12, 13, 31 2980 + Completion queries 18 2981 + 2982 + Domain name 6, 7 2983 + 2984 + Glue RRs 20 2985 + 2986 + HINFO 12 2987 + 2988 + IN 12 2989 + Inverse queries 16 2990 + Iterative 4 2991 + 2992 + Label 7 2993 + 2994 + Mailbox names 9 2995 + MX 12 2996 + 2997 + Name error 27, 36 2998 + Name servers 5, 17 2999 + NE 30 3000 + Negative caching 44 3001 + NS 12 3002 + 3003 + Opcode 16 3004 + 3005 + PTR 12 3006 + 3007 + QCLASS 16 3008 + QTYPE 16 3009 + 3010 + RDATA 13 3011 + Recursive 4 3012 + Recursive service 22 3013 + Relative names 7 3014 + Resolvers 6 3015 + RR 12 3016 + 3017 + 3018 + 3019 + 3020 + Mockapetris [Page 54] 3021 + 3022 + RFC 1034 Domain Concepts and Facilities November 1987 3023 + 3024 + 3025 + Safety belt 33 3026 + Sections 16 3027 + SOA 12 3028 + Standard queries 22 3029 + 3030 + Status queries 18 3031 + Stub resolvers 32 3032 + 3033 + TTL 12, 13 3034 + 3035 + Wildcards 25 3036 + 3037 + Zone transfers 28 3038 + Zones 19 3039 + 3040 + 3041 + 3042 + 3043 + 3044 + 3045 + 3046 + 3047 + 3048 + 3049 + 3050 + 3051 + 3052 + 3053 + 3054 + 3055 + 3056 + 3057 + 3058 + 3059 + 3060 + 3061 + 3062 + 3063 + 3064 + 3065 + 3066 + 3067 + 3068 + 3069 + 3070 + 3071 + 3072 + 3073 + 3074 + 3075 + 3076 + Mockapetris [Page 55] 3077 +
+3077
spec/rfc1035.txt
··· 1 + Network Working Group P. Mockapetris 2 + Request for Comments: 1035 ISI 3 + November 1987 4 + Obsoletes: RFCs 882, 883, 973 5 + 6 + DOMAIN NAMES - IMPLEMENTATION AND SPECIFICATION 7 + 8 + 9 + 1. STATUS OF THIS MEMO 10 + 11 + This RFC describes the details of the domain system and protocol, and 12 + assumes that the reader is familiar with the concepts discussed in a 13 + companion RFC, "Domain Names - Concepts and Facilities" [RFC-1034]. 14 + 15 + The domain system is a mixture of functions and data types which are an 16 + official protocol and functions and data types which are still 17 + experimental. Since the domain system is intentionally extensible, new 18 + data types and experimental behavior should always be expected in parts 19 + of the system beyond the official protocol. The official protocol parts 20 + include standard queries, responses and the Internet class RR data 21 + formats (e.g., host addresses). Since the previous RFC set, several 22 + definitions have changed, so some previous definitions are obsolete. 23 + 24 + Experimental or obsolete features are clearly marked in these RFCs, and 25 + such information should be used with caution. 26 + 27 + The reader is especially cautioned not to depend on the values which 28 + appear in examples to be current or complete, since their purpose is 29 + primarily pedagogical. Distribution of this memo is unlimited. 30 + 31 + Table of Contents 32 + 33 + 1. STATUS OF THIS MEMO 1 34 + 2. INTRODUCTION 3 35 + 2.1. Overview 3 36 + 2.2. Common configurations 4 37 + 2.3. Conventions 7 38 + 2.3.1. Preferred name syntax 7 39 + 2.3.2. Data Transmission Order 8 40 + 2.3.3. Character Case 9 41 + 2.3.4. Size limits 10 42 + 3. DOMAIN NAME SPACE AND RR DEFINITIONS 10 43 + 3.1. Name space definitions 10 44 + 3.2. RR definitions 11 45 + 3.2.1. Format 11 46 + 3.2.2. TYPE values 12 47 + 3.2.3. QTYPE values 12 48 + 3.2.4. CLASS values 13 49 + 50 + 51 + 52 + Mockapetris [Page 1] 53 + 54 + RFC 1035 Domain Implementation and Specification November 1987 55 + 56 + 57 + 3.2.5. QCLASS values 13 58 + 3.3. Standard RRs 13 59 + 3.3.1. CNAME RDATA format 14 60 + 3.3.2. HINFO RDATA format 14 61 + 3.3.3. MB RDATA format (EXPERIMENTAL) 14 62 + 3.3.4. MD RDATA format (Obsolete) 15 63 + 3.3.5. MF RDATA format (Obsolete) 15 64 + 3.3.6. MG RDATA format (EXPERIMENTAL) 16 65 + 3.3.7. MINFO RDATA format (EXPERIMENTAL) 16 66 + 3.3.8. MR RDATA format (EXPERIMENTAL) 17 67 + 3.3.9. MX RDATA format 17 68 + 3.3.10. NULL RDATA format (EXPERIMENTAL) 17 69 + 3.3.11. NS RDATA format 18 70 + 3.3.12. PTR RDATA format 18 71 + 3.3.13. SOA RDATA format 19 72 + 3.3.14. TXT RDATA format 20 73 + 3.4. ARPA Internet specific RRs 20 74 + 3.4.1. A RDATA format 20 75 + 3.4.2. WKS RDATA format 21 76 + 3.5. IN-ADDR.ARPA domain 22 77 + 3.6. Defining new types, classes, and special namespaces 24 78 + 4. MESSAGES 25 79 + 4.1. Format 25 80 + 4.1.1. Header section format 26 81 + 4.1.2. Question section format 28 82 + 4.1.3. Resource record format 29 83 + 4.1.4. Message compression 30 84 + 4.2. Transport 32 85 + 4.2.1. UDP usage 32 86 + 4.2.2. TCP usage 32 87 + 5. MASTER FILES 33 88 + 5.1. Format 33 89 + 5.2. Use of master files to define zones 35 90 + 5.3. Master file example 36 91 + 6. NAME SERVER IMPLEMENTATION 37 92 + 6.1. Architecture 37 93 + 6.1.1. Control 37 94 + 6.1.2. Database 37 95 + 6.1.3. Time 39 96 + 6.2. Standard query processing 39 97 + 6.3. Zone refresh and reload processing 39 98 + 6.4. Inverse queries (Optional) 40 99 + 6.4.1. The contents of inverse queries and responses 40 100 + 6.4.2. Inverse query and response example 41 101 + 6.4.3. Inverse query processing 42 102 + 103 + 104 + 105 + 106 + 107 + 108 + Mockapetris [Page 2] 109 + 110 + RFC 1035 Domain Implementation and Specification November 1987 111 + 112 + 113 + 6.5. Completion queries and responses 42 114 + 7. RESOLVER IMPLEMENTATION 43 115 + 7.1. Transforming a user request into a query 43 116 + 7.2. Sending the queries 44 117 + 7.3. Processing responses 46 118 + 7.4. Using the cache 47 119 + 8. MAIL SUPPORT 47 120 + 8.1. Mail exchange binding 48 121 + 8.2. Mailbox binding (Experimental) 48 122 + 9. REFERENCES and BIBLIOGRAPHY 50 123 + Index 54 124 + 125 + 2. INTRODUCTION 126 + 127 + 2.1. Overview 128 + 129 + The goal of domain names is to provide a mechanism for naming resources 130 + in such a way that the names are usable in different hosts, networks, 131 + protocol families, internets, and administrative organizations. 132 + 133 + From the user's point of view, domain names are useful as arguments to a 134 + local agent, called a resolver, which retrieves information associated 135 + with the domain name. Thus a user might ask for the host address or 136 + mail information associated with a particular domain name. To enable 137 + the user to request a particular type of information, an appropriate 138 + query type is passed to the resolver with the domain name. To the user, 139 + the domain tree is a single information space; the resolver is 140 + responsible for hiding the distribution of data among name servers from 141 + the user. 142 + 143 + From the resolver's point of view, the database that makes up the domain 144 + space is distributed among various name servers. Different parts of the 145 + domain space are stored in different name servers, although a particular 146 + data item will be stored redundantly in two or more name servers. The 147 + resolver starts with knowledge of at least one name server. When the 148 + resolver processes a user query it asks a known name server for the 149 + information; in return, the resolver either receives the desired 150 + information or a referral to another name server. Using these 151 + referrals, resolvers learn the identities and contents of other name 152 + servers. Resolvers are responsible for dealing with the distribution of 153 + the domain space and dealing with the effects of name server failure by 154 + consulting redundant databases in other servers. 155 + 156 + Name servers manage two kinds of data. The first kind of data held in 157 + sets called zones; each zone is the complete database for a particular 158 + "pruned" subtree of the domain space. This data is called 159 + authoritative. A name server periodically checks to make sure that its 160 + zones are up to date, and if not, obtains a new copy of updated zones 161 + 162 + 163 + 164 + Mockapetris [Page 3] 165 + 166 + RFC 1035 Domain Implementation and Specification November 1987 167 + 168 + 169 + from master files stored locally or in another name server. The second 170 + kind of data is cached data which was acquired by a local resolver. 171 + This data may be incomplete, but improves the performance of the 172 + retrieval process when non-local data is repeatedly accessed. Cached 173 + data is eventually discarded by a timeout mechanism. 174 + 175 + This functional structure isolates the problems of user interface, 176 + failure recovery, and distribution in the resolvers and isolates the 177 + database update and refresh problems in the name servers. 178 + 179 + 2.2. Common configurations 180 + 181 + A host can participate in the domain name system in a number of ways, 182 + depending on whether the host runs programs that retrieve information 183 + from the domain system, name servers that answer queries from other 184 + hosts, or various combinations of both functions. The simplest, and 185 + perhaps most typical, configuration is shown below: 186 + 187 + Local Host | Foreign 188 + | 189 + +---------+ +----------+ | +--------+ 190 + | | user queries | |queries | | | 191 + | User |-------------->| |---------|->|Foreign | 192 + | Program | | Resolver | | | Name | 193 + | |<--------------| |<--------|--| Server | 194 + | | user responses| |responses| | | 195 + +---------+ +----------+ | +--------+ 196 + | A | 197 + cache additions | | references | 198 + V | | 199 + +----------+ | 200 + | cache | | 201 + +----------+ | 202 + 203 + User programs interact with the domain name space through resolvers; the 204 + format of user queries and user responses is specific to the host and 205 + its operating system. User queries will typically be operating system 206 + calls, and the resolver and its cache will be part of the host operating 207 + system. Less capable hosts may choose to implement the resolver as a 208 + subroutine to be linked in with every program that needs its services. 209 + Resolvers answer user queries with information they acquire via queries 210 + to foreign name servers and the local cache. 211 + 212 + Note that the resolver may have to make several queries to several 213 + different foreign name servers to answer a particular user query, and 214 + hence the resolution of a user query may involve several network 215 + accesses and an arbitrary amount of time. The queries to foreign name 216 + servers and the corresponding responses have a standard format described 217 + 218 + 219 + 220 + Mockapetris [Page 4] 221 + 222 + RFC 1035 Domain Implementation and Specification November 1987 223 + 224 + 225 + in this memo, and may be datagrams. 226 + 227 + Depending on its capabilities, a name server could be a stand alone 228 + program on a dedicated machine or a process or processes on a large 229 + timeshared host. A simple configuration might be: 230 + 231 + Local Host | Foreign 232 + | 233 + +---------+ | 234 + / /| | 235 + +---------+ | +----------+ | +--------+ 236 + | | | | |responses| | | 237 + | | | | Name |---------|->|Foreign | 238 + | Master |-------------->| Server | | |Resolver| 239 + | files | | | |<--------|--| | 240 + | |/ | | queries | +--------+ 241 + +---------+ +----------+ | 242 + 243 + Here a primary name server acquires information about one or more zones 244 + by reading master files from its local file system, and answers queries 245 + about those zones that arrive from foreign resolvers. 246 + 247 + The DNS requires that all zones be redundantly supported by more than 248 + one name server. Designated secondary servers can acquire zones and 249 + check for updates from the primary server using the zone transfer 250 + protocol of the DNS. This configuration is shown below: 251 + 252 + Local Host | Foreign 253 + | 254 + +---------+ | 255 + / /| | 256 + +---------+ | +----------+ | +--------+ 257 + | | | | |responses| | | 258 + | | | | Name |---------|->|Foreign | 259 + | Master |-------------->| Server | | |Resolver| 260 + | files | | | |<--------|--| | 261 + | |/ | | queries | +--------+ 262 + +---------+ +----------+ | 263 + A |maintenance | +--------+ 264 + | +------------|->| | 265 + | queries | |Foreign | 266 + | | | Name | 267 + +------------------|--| Server | 268 + maintenance responses | +--------+ 269 + 270 + In this configuration, the name server periodically establishes a 271 + virtual circuit to a foreign name server to acquire a copy of a zone or 272 + to check that an existing copy has not changed. The messages sent for 273 + 274 + 275 + 276 + Mockapetris [Page 5] 277 + 278 + RFC 1035 Domain Implementation and Specification November 1987 279 + 280 + 281 + these maintenance activities follow the same form as queries and 282 + responses, but the message sequences are somewhat different. 283 + 284 + The information flow in a host that supports all aspects of the domain 285 + name system is shown below: 286 + 287 + Local Host | Foreign 288 + | 289 + +---------+ +----------+ | +--------+ 290 + | | user queries | |queries | | | 291 + | User |-------------->| |---------|->|Foreign | 292 + | Program | | Resolver | | | Name | 293 + | |<--------------| |<--------|--| Server | 294 + | | user responses| |responses| | | 295 + +---------+ +----------+ | +--------+ 296 + | A | 297 + cache additions | | references | 298 + V | | 299 + +----------+ | 300 + | Shared | | 301 + | database | | 302 + +----------+ | 303 + A | | 304 + +---------+ refreshes | | references | 305 + / /| | V | 306 + +---------+ | +----------+ | +--------+ 307 + | | | | |responses| | | 308 + | | | | Name |---------|->|Foreign | 309 + | Master |-------------->| Server | | |Resolver| 310 + | files | | | |<--------|--| | 311 + | |/ | | queries | +--------+ 312 + +---------+ +----------+ | 313 + A |maintenance | +--------+ 314 + | +------------|->| | 315 + | queries | |Foreign | 316 + | | | Name | 317 + +------------------|--| Server | 318 + maintenance responses | +--------+ 319 + 320 + The shared database holds domain space data for the local name server 321 + and resolver. The contents of the shared database will typically be a 322 + mixture of authoritative data maintained by the periodic refresh 323 + operations of the name server and cached data from previous resolver 324 + requests. The structure of the domain data and the necessity for 325 + synchronization between name servers and resolvers imply the general 326 + characteristics of this database, but the actual format is up to the 327 + local implementor. 328 + 329 + 330 + 331 + 332 + Mockapetris [Page 6] 333 + 334 + RFC 1035 Domain Implementation and Specification November 1987 335 + 336 + 337 + Information flow can also be tailored so that a group of hosts act 338 + together to optimize activities. Sometimes this is done to offload less 339 + capable hosts so that they do not have to implement a full resolver. 340 + This can be appropriate for PCs or hosts which want to minimize the 341 + amount of new network code which is required. This scheme can also 342 + allow a group of hosts can share a small number of caches rather than 343 + maintaining a large number of separate caches, on the premise that the 344 + centralized caches will have a higher hit ratio. In either case, 345 + resolvers are replaced with stub resolvers which act as front ends to 346 + resolvers located in a recursive server in one or more name servers 347 + known to perform that service: 348 + 349 + Local Hosts | Foreign 350 + | 351 + +---------+ | 352 + | | responses | 353 + | Stub |<--------------------+ | 354 + | Resolver| | | 355 + | |----------------+ | | 356 + +---------+ recursive | | | 357 + queries | | | 358 + V | | 359 + +---------+ recursive +----------+ | +--------+ 360 + | | queries | |queries | | | 361 + | Stub |-------------->| Recursive|---------|->|Foreign | 362 + | Resolver| | Server | | | Name | 363 + | |<--------------| |<--------|--| Server | 364 + +---------+ responses | |responses| | | 365 + +----------+ | +--------+ 366 + | Central | | 367 + | cache | | 368 + +----------+ | 369 + 370 + In any case, note that domain components are always replicated for 371 + reliability whenever possible. 372 + 373 + 2.3. Conventions 374 + 375 + The domain system has several conventions dealing with low-level, but 376 + fundamental, issues. While the implementor is free to violate these 377 + conventions WITHIN HIS OWN SYSTEM, he must observe these conventions in 378 + ALL behavior observed from other hosts. 379 + 380 + 2.3.1. Preferred name syntax 381 + 382 + The DNS specifications attempt to be as general as possible in the rules 383 + for constructing domain names. The idea is that the name of any 384 + existing object can be expressed as a domain name with minimal changes. 385 + 386 + 387 + 388 + Mockapetris [Page 7] 389 + 390 + RFC 1035 Domain Implementation and Specification November 1987 391 + 392 + 393 + However, when assigning a domain name for an object, the prudent user 394 + will select a name which satisfies both the rules of the domain system 395 + and any existing rules for the object, whether these rules are published 396 + or implied by existing programs. 397 + 398 + For example, when naming a mail domain, the user should satisfy both the 399 + rules of this memo and those in RFC-822. When creating a new host name, 400 + the old rules for HOSTS.TXT should be followed. This avoids problems 401 + when old software is converted to use domain names. 402 + 403 + The following syntax will result in fewer problems with many 404 + 405 + applications that use domain names (e.g., mail, TELNET). 406 + 407 + <domain> ::= <subdomain> | " " 408 + 409 + <subdomain> ::= <label> | <subdomain> "." <label> 410 + 411 + <label> ::= <letter> [ [ <ldh-str> ] <let-dig> ] 412 + 413 + <ldh-str> ::= <let-dig-hyp> | <let-dig-hyp> <ldh-str> 414 + 415 + <let-dig-hyp> ::= <let-dig> | "-" 416 + 417 + <let-dig> ::= <letter> | <digit> 418 + 419 + <letter> ::= any one of the 52 alphabetic characters A through Z in 420 + upper case and a through z in lower case 421 + 422 + <digit> ::= any one of the ten digits 0 through 9 423 + 424 + Note that while upper and lower case letters are allowed in domain 425 + names, no significance is attached to the case. That is, two names with 426 + the same spelling but different case are to be treated as if identical. 427 + 428 + The labels must follow the rules for ARPANET host names. They must 429 + start with a letter, end with a letter or digit, and have as interior 430 + characters only letters, digits, and hyphen. There are also some 431 + restrictions on the length. Labels must be 63 characters or less. 432 + 433 + For example, the following strings identify hosts in the Internet: 434 + 435 + A.ISI.EDU XX.LCS.MIT.EDU SRI-NIC.ARPA 436 + 437 + 2.3.2. Data Transmission Order 438 + 439 + The order of transmission of the header and data described in this 440 + document is resolved to the octet level. Whenever a diagram shows a 441 + 442 + 443 + 444 + Mockapetris [Page 8] 445 + 446 + RFC 1035 Domain Implementation and Specification November 1987 447 + 448 + 449 + group of octets, the order of transmission of those octets is the normal 450 + order in which they are read in English. For example, in the following 451 + diagram, the octets are transmitted in the order they are numbered. 452 + 453 + 0 1 454 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 455 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 456 + | 1 | 2 | 457 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 458 + | 3 | 4 | 459 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 460 + | 5 | 6 | 461 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 462 + 463 + Whenever an octet represents a numeric quantity, the left most bit in 464 + the diagram is the high order or most significant bit. That is, the bit 465 + labeled 0 is the most significant bit. For example, the following 466 + diagram represents the value 170 (decimal). 467 + 468 + 0 1 2 3 4 5 6 7 469 + +-+-+-+-+-+-+-+-+ 470 + |1 0 1 0 1 0 1 0| 471 + +-+-+-+-+-+-+-+-+ 472 + 473 + Similarly, whenever a multi-octet field represents a numeric quantity 474 + the left most bit of the whole field is the most significant bit. When 475 + a multi-octet quantity is transmitted the most significant octet is 476 + transmitted first. 477 + 478 + 2.3.3. Character Case 479 + 480 + For all parts of the DNS that are part of the official protocol, all 481 + comparisons between character strings (e.g., labels, domain names, etc.) 482 + are done in a case-insensitive manner. At present, this rule is in 483 + force throughout the domain system without exception. However, future 484 + additions beyond current usage may need to use the full binary octet 485 + capabilities in names, so attempts to store domain names in 7-bit ASCII 486 + or use of special bytes to terminate labels, etc., should be avoided. 487 + 488 + When data enters the domain system, its original case should be 489 + preserved whenever possible. In certain circumstances this cannot be 490 + done. For example, if two RRs are stored in a database, one at x.y and 491 + one at X.Y, they are actually stored at the same place in the database, 492 + and hence only one casing would be preserved. The basic rule is that 493 + case can be discarded only when data is used to define structure in a 494 + database, and two names are identical when compared in a case 495 + insensitive manner. 496 + 497 + 498 + 499 + 500 + Mockapetris [Page 9] 501 + 502 + RFC 1035 Domain Implementation and Specification November 1987 503 + 504 + 505 + Loss of case sensitive data must be minimized. Thus while data for x.y 506 + and X.Y may both be stored under a single location x.y or X.Y, data for 507 + a.x and B.X would never be stored under A.x, A.X, b.x, or b.X. In 508 + general, this preserves the case of the first label of a domain name, 509 + but forces standardization of interior node labels. 510 + 511 + Systems administrators who enter data into the domain database should 512 + take care to represent the data they supply to the domain system in a 513 + case-consistent manner if their system is case-sensitive. The data 514 + distribution system in the domain system will ensure that consistent 515 + representations are preserved. 516 + 517 + 2.3.4. Size limits 518 + 519 + Various objects and parameters in the DNS have size limits. They are 520 + listed below. Some could be easily changed, others are more 521 + fundamental. 522 + 523 + labels 63 octets or less 524 + 525 + names 255 octets or less 526 + 527 + TTL positive values of a signed 32 bit number. 528 + 529 + UDP messages 512 octets or less 530 + 531 + 3. DOMAIN NAME SPACE AND RR DEFINITIONS 532 + 533 + 3.1. Name space definitions 534 + 535 + Domain names in messages are expressed in terms of a sequence of labels. 536 + Each label is represented as a one octet length field followed by that 537 + number of octets. Since every domain name ends with the null label of 538 + the root, a domain name is terminated by a length byte of zero. The 539 + high order two bits of every length octet must be zero, and the 540 + remaining six bits of the length field limit the label to 63 octets or 541 + less. 542 + 543 + To simplify implementations, the total length of a domain name (i.e., 544 + label octets and label length octets) is restricted to 255 octets or 545 + less. 546 + 547 + Although labels can contain any 8 bit values in octets that make up a 548 + label, it is strongly recommended that labels follow the preferred 549 + syntax described elsewhere in this memo, which is compatible with 550 + existing host naming conventions. Name servers and resolvers must 551 + compare labels in a case-insensitive manner (i.e., A=a), assuming ASCII 552 + with zero parity. Non-alphabetic codes must match exactly. 553 + 554 + 555 + 556 + Mockapetris [Page 10] 557 + 558 + RFC 1035 Domain Implementation and Specification November 1987 559 + 560 + 561 + 3.2. RR definitions 562 + 563 + 3.2.1. Format 564 + 565 + All RRs have the same top level format shown below: 566 + 567 + 1 1 1 1 1 1 568 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 569 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 570 + | | 571 + / / 572 + / NAME / 573 + | | 574 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 575 + | TYPE | 576 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 577 + | CLASS | 578 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 579 + | TTL | 580 + | | 581 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 582 + | RDLENGTH | 583 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--| 584 + / RDATA / 585 + / / 586 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 587 + 588 + 589 + where: 590 + 591 + NAME an owner name, i.e., the name of the node to which this 592 + resource record pertains. 593 + 594 + TYPE two octets containing one of the RR TYPE codes. 595 + 596 + CLASS two octets containing one of the RR CLASS codes. 597 + 598 + TTL a 32 bit signed integer that specifies the time interval 599 + that the resource record may be cached before the source 600 + of the information should again be consulted. Zero 601 + values are interpreted to mean that the RR can only be 602 + used for the transaction in progress, and should not be 603 + cached. For example, SOA records are always distributed 604 + with a zero TTL to prohibit caching. Zero values can 605 + also be used for extremely volatile data. 606 + 607 + RDLENGTH an unsigned 16 bit integer that specifies the length in 608 + octets of the RDATA field. 609 + 610 + 611 + 612 + Mockapetris [Page 11] 613 + 614 + RFC 1035 Domain Implementation and Specification November 1987 615 + 616 + 617 + RDATA a variable length string of octets that describes the 618 + resource. The format of this information varies 619 + according to the TYPE and CLASS of the resource record. 620 + 621 + 3.2.2. TYPE values 622 + 623 + TYPE fields are used in resource records. Note that these types are a 624 + subset of QTYPEs. 625 + 626 + TYPE value and meaning 627 + 628 + A 1 a host address 629 + 630 + NS 2 an authoritative name server 631 + 632 + MD 3 a mail destination (Obsolete - use MX) 633 + 634 + MF 4 a mail forwarder (Obsolete - use MX) 635 + 636 + CNAME 5 the canonical name for an alias 637 + 638 + SOA 6 marks the start of a zone of authority 639 + 640 + MB 7 a mailbox domain name (EXPERIMENTAL) 641 + 642 + MG 8 a mail group member (EXPERIMENTAL) 643 + 644 + MR 9 a mail rename domain name (EXPERIMENTAL) 645 + 646 + NULL 10 a null RR (EXPERIMENTAL) 647 + 648 + WKS 11 a well known service description 649 + 650 + PTR 12 a domain name pointer 651 + 652 + HINFO 13 host information 653 + 654 + MINFO 14 mailbox or mail list information 655 + 656 + MX 15 mail exchange 657 + 658 + TXT 16 text strings 659 + 660 + 3.2.3. QTYPE values 661 + 662 + QTYPE fields appear in the question part of a query. QTYPES are a 663 + superset of TYPEs, hence all TYPEs are valid QTYPEs. In addition, the 664 + following QTYPEs are defined: 665 + 666 + 667 + 668 + Mockapetris [Page 12] 669 + 670 + RFC 1035 Domain Implementation and Specification November 1987 671 + 672 + 673 + AXFR 252 A request for a transfer of an entire zone 674 + 675 + MAILB 253 A request for mailbox-related records (MB, MG or MR) 676 + 677 + MAILA 254 A request for mail agent RRs (Obsolete - see MX) 678 + 679 + * 255 A request for all records 680 + 681 + 3.2.4. CLASS values 682 + 683 + CLASS fields appear in resource records. The following CLASS mnemonics 684 + and values are defined: 685 + 686 + IN 1 the Internet 687 + 688 + CS 2 the CSNET class (Obsolete - used only for examples in 689 + some obsolete RFCs) 690 + 691 + CH 3 the CHAOS class 692 + 693 + HS 4 Hesiod [Dyer 87] 694 + 695 + 3.2.5. QCLASS values 696 + 697 + QCLASS fields appear in the question section of a query. QCLASS values 698 + are a superset of CLASS values; every CLASS is a valid QCLASS. In 699 + addition to CLASS values, the following QCLASSes are defined: 700 + 701 + * 255 any class 702 + 703 + 3.3. Standard RRs 704 + 705 + The following RR definitions are expected to occur, at least 706 + potentially, in all classes. In particular, NS, SOA, CNAME, and PTR 707 + will be used in all classes, and have the same format in all classes. 708 + Because their RDATA format is known, all domain names in the RDATA 709 + section of these RRs may be compressed. 710 + 711 + <domain-name> is a domain name represented as a series of labels, and 712 + terminated by a label with zero length. <character-string> is a single 713 + length octet followed by that number of characters. <character-string> 714 + is treated as binary information, and can be up to 256 characters in 715 + length (including the length octet). 716 + 717 + 718 + 719 + 720 + 721 + 722 + 723 + 724 + Mockapetris [Page 13] 725 + 726 + RFC 1035 Domain Implementation and Specification November 1987 727 + 728 + 729 + 3.3.1. CNAME RDATA format 730 + 731 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 732 + / CNAME / 733 + / / 734 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 735 + 736 + where: 737 + 738 + CNAME A <domain-name> which specifies the canonical or primary 739 + name for the owner. The owner name is an alias. 740 + 741 + CNAME RRs cause no additional section processing, but name servers may 742 + choose to restart the query at the canonical name in certain cases. See 743 + the description of name server logic in [RFC-1034] for details. 744 + 745 + 3.3.2. HINFO RDATA format 746 + 747 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 748 + / CPU / 749 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 750 + / OS / 751 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 752 + 753 + where: 754 + 755 + CPU A <character-string> which specifies the CPU type. 756 + 757 + OS A <character-string> which specifies the operating 758 + system type. 759 + 760 + Standard values for CPU and OS can be found in [RFC-1010]. 761 + 762 + HINFO records are used to acquire general information about a host. The 763 + main use is for protocols such as FTP that can use special procedures 764 + when talking between machines or operating systems of the same type. 765 + 766 + 3.3.3. MB RDATA format (EXPERIMENTAL) 767 + 768 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 769 + / MADNAME / 770 + / / 771 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 772 + 773 + where: 774 + 775 + MADNAME A <domain-name> which specifies a host which has the 776 + specified mailbox. 777 + 778 + 779 + 780 + Mockapetris [Page 14] 781 + 782 + RFC 1035 Domain Implementation and Specification November 1987 783 + 784 + 785 + MB records cause additional section processing which looks up an A type 786 + RRs corresponding to MADNAME. 787 + 788 + 3.3.4. MD RDATA format (Obsolete) 789 + 790 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 791 + / MADNAME / 792 + / / 793 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 794 + 795 + where: 796 + 797 + MADNAME A <domain-name> which specifies a host which has a mail 798 + agent for the domain which should be able to deliver 799 + mail for the domain. 800 + 801 + MD records cause additional section processing which looks up an A type 802 + record corresponding to MADNAME. 803 + 804 + MD is obsolete. See the definition of MX and [RFC-974] for details of 805 + the new scheme. The recommended policy for dealing with MD RRs found in 806 + a master file is to reject them, or to convert them to MX RRs with a 807 + preference of 0. 808 + 809 + 3.3.5. MF RDATA format (Obsolete) 810 + 811 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 812 + / MADNAME / 813 + / / 814 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 815 + 816 + where: 817 + 818 + MADNAME A <domain-name> which specifies a host which has a mail 819 + agent for the domain which will accept mail for 820 + forwarding to the domain. 821 + 822 + MF records cause additional section processing which looks up an A type 823 + record corresponding to MADNAME. 824 + 825 + MF is obsolete. See the definition of MX and [RFC-974] for details ofw 826 + the new scheme. The recommended policy for dealing with MD RRs found in 827 + a master file is to reject them, or to convert them to MX RRs with a 828 + preference of 10. 829 + 830 + 831 + 832 + 833 + 834 + 835 + 836 + Mockapetris [Page 15] 837 + 838 + RFC 1035 Domain Implementation and Specification November 1987 839 + 840 + 841 + 3.3.6. MG RDATA format (EXPERIMENTAL) 842 + 843 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 844 + / MGMNAME / 845 + / / 846 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 847 + 848 + where: 849 + 850 + MGMNAME A <domain-name> which specifies a mailbox which is a 851 + member of the mail group specified by the domain name. 852 + 853 + MG records cause no additional section processing. 854 + 855 + 3.3.7. MINFO RDATA format (EXPERIMENTAL) 856 + 857 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 858 + / RMAILBX / 859 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 860 + / EMAILBX / 861 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 862 + 863 + where: 864 + 865 + RMAILBX A <domain-name> which specifies a mailbox which is 866 + responsible for the mailing list or mailbox. If this 867 + domain name names the root, the owner of the MINFO RR is 868 + responsible for itself. Note that many existing mailing 869 + lists use a mailbox X-request for the RMAILBX field of 870 + mailing list X, e.g., Msgroup-request for Msgroup. This 871 + field provides a more general mechanism. 872 + 873 + 874 + EMAILBX A <domain-name> which specifies a mailbox which is to 875 + receive error messages related to the mailing list or 876 + mailbox specified by the owner of the MINFO RR (similar 877 + to the ERRORS-TO: field which has been proposed). If 878 + this domain name names the root, errors should be 879 + returned to the sender of the message. 880 + 881 + MINFO records cause no additional section processing. Although these 882 + records can be associated with a simple mailbox, they are usually used 883 + with a mailing list. 884 + 885 + 886 + 887 + 888 + 889 + 890 + 891 + 892 + Mockapetris [Page 16] 893 + 894 + RFC 1035 Domain Implementation and Specification November 1987 895 + 896 + 897 + 3.3.8. MR RDATA format (EXPERIMENTAL) 898 + 899 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 900 + / NEWNAME / 901 + / / 902 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 903 + 904 + where: 905 + 906 + NEWNAME A <domain-name> which specifies a mailbox which is the 907 + proper rename of the specified mailbox. 908 + 909 + MR records cause no additional section processing. The main use for MR 910 + is as a forwarding entry for a user who has moved to a different 911 + mailbox. 912 + 913 + 3.3.9. MX RDATA format 914 + 915 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 916 + | PREFERENCE | 917 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 918 + / EXCHANGE / 919 + / / 920 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 921 + 922 + where: 923 + 924 + PREFERENCE A 16 bit integer which specifies the preference given to 925 + this RR among others at the same owner. Lower values 926 + are preferred. 927 + 928 + EXCHANGE A <domain-name> which specifies a host willing to act as 929 + a mail exchange for the owner name. 930 + 931 + MX records cause type A additional section processing for the host 932 + specified by EXCHANGE. The use of MX RRs is explained in detail in 933 + [RFC-974]. 934 + 935 + 3.3.10. NULL RDATA format (EXPERIMENTAL) 936 + 937 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 938 + / <anything> / 939 + / / 940 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 941 + 942 + Anything at all may be in the RDATA field so long as it is 65535 octets 943 + or less. 944 + 945 + 946 + 947 + 948 + Mockapetris [Page 17] 949 + 950 + RFC 1035 Domain Implementation and Specification November 1987 951 + 952 + 953 + NULL records cause no additional section processing. NULL RRs are not 954 + allowed in master files. NULLs are used as placeholders in some 955 + experimental extensions of the DNS. 956 + 957 + 3.3.11. NS RDATA format 958 + 959 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 960 + / NSDNAME / 961 + / / 962 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 963 + 964 + where: 965 + 966 + NSDNAME A <domain-name> which specifies a host which should be 967 + authoritative for the specified class and domain. 968 + 969 + NS records cause both the usual additional section processing to locate 970 + a type A record, and, when used in a referral, a special search of the 971 + zone in which they reside for glue information. 972 + 973 + The NS RR states that the named host should be expected to have a zone 974 + starting at owner name of the specified class. Note that the class may 975 + not indicate the protocol family which should be used to communicate 976 + with the host, although it is typically a strong hint. For example, 977 + hosts which are name servers for either Internet (IN) or Hesiod (HS) 978 + class information are normally queried using IN class protocols. 979 + 980 + 3.3.12. PTR RDATA format 981 + 982 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 983 + / PTRDNAME / 984 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 985 + 986 + where: 987 + 988 + PTRDNAME A <domain-name> which points to some location in the 989 + domain name space. 990 + 991 + PTR records cause no additional section processing. These RRs are used 992 + in special domains to point to some other location in the domain space. 993 + These records are simple data, and don't imply any special processing 994 + similar to that performed by CNAME, which identifies aliases. See the 995 + description of the IN-ADDR.ARPA domain for an example. 996 + 997 + 998 + 999 + 1000 + 1001 + 1002 + 1003 + 1004 + Mockapetris [Page 18] 1005 + 1006 + RFC 1035 Domain Implementation and Specification November 1987 1007 + 1008 + 1009 + 3.3.13. SOA RDATA format 1010 + 1011 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1012 + / MNAME / 1013 + / / 1014 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1015 + / RNAME / 1016 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1017 + | SERIAL | 1018 + | | 1019 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1020 + | REFRESH | 1021 + | | 1022 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1023 + | RETRY | 1024 + | | 1025 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1026 + | EXPIRE | 1027 + | | 1028 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1029 + | MINIMUM | 1030 + | | 1031 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1032 + 1033 + where: 1034 + 1035 + MNAME The <domain-name> of the name server that was the 1036 + original or primary source of data for this zone. 1037 + 1038 + RNAME A <domain-name> which specifies the mailbox of the 1039 + person responsible for this zone. 1040 + 1041 + SERIAL The unsigned 32 bit version number of the original copy 1042 + of the zone. Zone transfers preserve this value. This 1043 + value wraps and should be compared using sequence space 1044 + arithmetic. 1045 + 1046 + REFRESH A 32 bit time interval before the zone should be 1047 + refreshed. 1048 + 1049 + RETRY A 32 bit time interval that should elapse before a 1050 + failed refresh should be retried. 1051 + 1052 + EXPIRE A 32 bit time value that specifies the upper limit on 1053 + the time interval that can elapse before the zone is no 1054 + longer authoritative. 1055 + 1056 + 1057 + 1058 + 1059 + 1060 + Mockapetris [Page 19] 1061 + 1062 + RFC 1035 Domain Implementation and Specification November 1987 1063 + 1064 + 1065 + MINIMUM The unsigned 32 bit minimum TTL field that should be 1066 + exported with any RR from this zone. 1067 + 1068 + SOA records cause no additional section processing. 1069 + 1070 + All times are in units of seconds. 1071 + 1072 + Most of these fields are pertinent only for name server maintenance 1073 + operations. However, MINIMUM is used in all query operations that 1074 + retrieve RRs from a zone. Whenever a RR is sent in a response to a 1075 + query, the TTL field is set to the maximum of the TTL field from the RR 1076 + and the MINIMUM field in the appropriate SOA. Thus MINIMUM is a lower 1077 + bound on the TTL field for all RRs in a zone. Note that this use of 1078 + MINIMUM should occur when the RRs are copied into the response and not 1079 + when the zone is loaded from a master file or via a zone transfer. The 1080 + reason for this provison is to allow future dynamic update facilities to 1081 + change the SOA RR with known semantics. 1082 + 1083 + 1084 + 3.3.14. TXT RDATA format 1085 + 1086 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1087 + / TXT-DATA / 1088 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1089 + 1090 + where: 1091 + 1092 + TXT-DATA One or more <character-string>s. 1093 + 1094 + TXT RRs are used to hold descriptive text. The semantics of the text 1095 + depends on the domain where it is found. 1096 + 1097 + 3.4. Internet specific RRs 1098 + 1099 + 3.4.1. A RDATA format 1100 + 1101 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1102 + | ADDRESS | 1103 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1104 + 1105 + where: 1106 + 1107 + ADDRESS A 32 bit Internet address. 1108 + 1109 + Hosts that have multiple Internet addresses will have multiple A 1110 + records. 1111 + 1112 + 1113 + 1114 + 1115 + 1116 + Mockapetris [Page 20] 1117 + 1118 + RFC 1035 Domain Implementation and Specification November 1987 1119 + 1120 + 1121 + A records cause no additional section processing. The RDATA section of 1122 + an A line in a master file is an Internet address expressed as four 1123 + decimal numbers separated by dots without any imbedded spaces (e.g., 1124 + "10.2.0.52" or "192.0.5.6"). 1125 + 1126 + 3.4.2. WKS RDATA format 1127 + 1128 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1129 + | ADDRESS | 1130 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1131 + | PROTOCOL | | 1132 + +--+--+--+--+--+--+--+--+ | 1133 + | | 1134 + / <BIT MAP> / 1135 + / / 1136 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1137 + 1138 + where: 1139 + 1140 + ADDRESS An 32 bit Internet address 1141 + 1142 + PROTOCOL An 8 bit IP protocol number 1143 + 1144 + <BIT MAP> A variable length bit map. The bit map must be a 1145 + multiple of 8 bits long. 1146 + 1147 + The WKS record is used to describe the well known services supported by 1148 + a particular protocol on a particular internet address. The PROTOCOL 1149 + field specifies an IP protocol number, and the bit map has one bit per 1150 + port of the specified protocol. The first bit corresponds to port 0, 1151 + the second to port 1, etc. If the bit map does not include a bit for a 1152 + protocol of interest, that bit is assumed zero. The appropriate values 1153 + and mnemonics for ports and protocols are specified in [RFC-1010]. 1154 + 1155 + For example, if PROTOCOL=TCP (6), the 26th bit corresponds to TCP port 1156 + 25 (SMTP). If this bit is set, a SMTP server should be listening on TCP 1157 + port 25; if zero, SMTP service is not supported on the specified 1158 + address. 1159 + 1160 + The purpose of WKS RRs is to provide availability information for 1161 + servers for TCP and UDP. If a server supports both TCP and UDP, or has 1162 + multiple Internet addresses, then multiple WKS RRs are used. 1163 + 1164 + WKS RRs cause no additional section processing. 1165 + 1166 + In master files, both ports and protocols are expressed using mnemonics 1167 + or decimal numbers. 1168 + 1169 + 1170 + 1171 + 1172 + Mockapetris [Page 21] 1173 + 1174 + RFC 1035 Domain Implementation and Specification November 1987 1175 + 1176 + 1177 + 3.5. IN-ADDR.ARPA domain 1178 + 1179 + The Internet uses a special domain to support gateway location and 1180 + Internet address to host mapping. Other classes may employ a similar 1181 + strategy in other domains. The intent of this domain is to provide a 1182 + guaranteed method to perform host address to host name mapping, and to 1183 + facilitate queries to locate all gateways on a particular network in the 1184 + Internet. 1185 + 1186 + Note that both of these services are similar to functions that could be 1187 + performed by inverse queries; the difference is that this part of the 1188 + domain name space is structured according to address, and hence can 1189 + guarantee that the appropriate data can be located without an exhaustive 1190 + search of the domain space. 1191 + 1192 + The domain begins at IN-ADDR.ARPA and has a substructure which follows 1193 + the Internet addressing structure. 1194 + 1195 + Domain names in the IN-ADDR.ARPA domain are defined to have up to four 1196 + labels in addition to the IN-ADDR.ARPA suffix. Each label represents 1197 + one octet of an Internet address, and is expressed as a character string 1198 + for a decimal value in the range 0-255 (with leading zeros omitted 1199 + except in the case of a zero octet which is represented by a single 1200 + zero). 1201 + 1202 + Host addresses are represented by domain names that have all four labels 1203 + specified. Thus data for Internet address 10.2.0.52 is located at 1204 + domain name 52.0.2.10.IN-ADDR.ARPA. The reversal, though awkward to 1205 + read, allows zones to be delegated which are exactly one network of 1206 + address space. For example, 10.IN-ADDR.ARPA can be a zone containing 1207 + data for the ARPANET, while 26.IN-ADDR.ARPA can be a separate zone for 1208 + MILNET. Address nodes are used to hold pointers to primary host names 1209 + in the normal domain space. 1210 + 1211 + Network numbers correspond to some non-terminal nodes at various depths 1212 + in the IN-ADDR.ARPA domain, since Internet network numbers are either 1, 1213 + 2, or 3 octets. Network nodes are used to hold pointers to the primary 1214 + host names of gateways attached to that network. Since a gateway is, by 1215 + definition, on more than one network, it will typically have two or more 1216 + network nodes which point at it. Gateways will also have host level 1217 + pointers at their fully qualified addresses. 1218 + 1219 + Both the gateway pointers at network nodes and the normal host pointers 1220 + at full address nodes use the PTR RR to point back to the primary domain 1221 + names of the corresponding hosts. 1222 + 1223 + For example, the IN-ADDR.ARPA domain will contain information about the 1224 + ISI gateway between net 10 and 26, an MIT gateway from net 10 to MIT's 1225 + 1226 + 1227 + 1228 + Mockapetris [Page 22] 1229 + 1230 + RFC 1035 Domain Implementation and Specification November 1987 1231 + 1232 + 1233 + net 18, and hosts A.ISI.EDU and MULTICS.MIT.EDU. Assuming that ISI 1234 + gateway has addresses 10.2.0.22 and 26.0.0.103, and a name MILNET- 1235 + GW.ISI.EDU, and the MIT gateway has addresses 10.0.0.77 and 18.10.0.4 1236 + and a name GW.LCS.MIT.EDU, the domain database would contain: 1237 + 1238 + 10.IN-ADDR.ARPA. PTR MILNET-GW.ISI.EDU. 1239 + 10.IN-ADDR.ARPA. PTR GW.LCS.MIT.EDU. 1240 + 18.IN-ADDR.ARPA. PTR GW.LCS.MIT.EDU. 1241 + 26.IN-ADDR.ARPA. PTR MILNET-GW.ISI.EDU. 1242 + 22.0.2.10.IN-ADDR.ARPA. PTR MILNET-GW.ISI.EDU. 1243 + 103.0.0.26.IN-ADDR.ARPA. PTR MILNET-GW.ISI.EDU. 1244 + 77.0.0.10.IN-ADDR.ARPA. PTR GW.LCS.MIT.EDU. 1245 + 4.0.10.18.IN-ADDR.ARPA. PTR GW.LCS.MIT.EDU. 1246 + 103.0.3.26.IN-ADDR.ARPA. PTR A.ISI.EDU. 1247 + 6.0.0.10.IN-ADDR.ARPA. PTR MULTICS.MIT.EDU. 1248 + 1249 + Thus a program which wanted to locate gateways on net 10 would originate 1250 + a query of the form QTYPE=PTR, QCLASS=IN, QNAME=10.IN-ADDR.ARPA. It 1251 + would receive two RRs in response: 1252 + 1253 + 10.IN-ADDR.ARPA. PTR MILNET-GW.ISI.EDU. 1254 + 10.IN-ADDR.ARPA. PTR GW.LCS.MIT.EDU. 1255 + 1256 + The program could then originate QTYPE=A, QCLASS=IN queries for MILNET- 1257 + GW.ISI.EDU. and GW.LCS.MIT.EDU. to discover the Internet addresses of 1258 + these gateways. 1259 + 1260 + A resolver which wanted to find the host name corresponding to Internet 1261 + host address 10.0.0.6 would pursue a query of the form QTYPE=PTR, 1262 + QCLASS=IN, QNAME=6.0.0.10.IN-ADDR.ARPA, and would receive: 1263 + 1264 + 6.0.0.10.IN-ADDR.ARPA. PTR MULTICS.MIT.EDU. 1265 + 1266 + Several cautions apply to the use of these services: 1267 + - Since the IN-ADDR.ARPA special domain and the normal domain 1268 + for a particular host or gateway will be in different zones, 1269 + the possibility exists that that the data may be inconsistent. 1270 + 1271 + - Gateways will often have two names in separate domains, only 1272 + one of which can be primary. 1273 + 1274 + - Systems that use the domain database to initialize their 1275 + routing tables must start with enough gateway information to 1276 + guarantee that they can access the appropriate name server. 1277 + 1278 + - The gateway data only reflects the existence of a gateway in a 1279 + manner equivalent to the current HOSTS.TXT file. It doesn't 1280 + replace the dynamic availability information from GGP or EGP. 1281 + 1282 + 1283 + 1284 + Mockapetris [Page 23] 1285 + 1286 + RFC 1035 Domain Implementation and Specification November 1987 1287 + 1288 + 1289 + 3.6. Defining new types, classes, and special namespaces 1290 + 1291 + The previously defined types and classes are the ones in use as of the 1292 + date of this memo. New definitions should be expected. This section 1293 + makes some recommendations to designers considering additions to the 1294 + existing facilities. The mailing list NAMEDROPPERS@SRI-NIC.ARPA is the 1295 + forum where general discussion of design issues takes place. 1296 + 1297 + In general, a new type is appropriate when new information is to be 1298 + added to the database about an existing object, or we need new data 1299 + formats for some totally new object. Designers should attempt to define 1300 + types and their RDATA formats that are generally applicable to all 1301 + classes, and which avoid duplication of information. New classes are 1302 + appropriate when the DNS is to be used for a new protocol, etc which 1303 + requires new class-specific data formats, or when a copy of the existing 1304 + name space is desired, but a separate management domain is necessary. 1305 + 1306 + New types and classes need mnemonics for master files; the format of the 1307 + master files requires that the mnemonics for type and class be disjoint. 1308 + 1309 + TYPE and CLASS values must be a proper subset of QTYPEs and QCLASSes 1310 + respectively. 1311 + 1312 + The present system uses multiple RRs to represent multiple values of a 1313 + type rather than storing multiple values in the RDATA section of a 1314 + single RR. This is less efficient for most applications, but does keep 1315 + RRs shorter. The multiple RRs assumption is incorporated in some 1316 + experimental work on dynamic update methods. 1317 + 1318 + The present system attempts to minimize the duplication of data in the 1319 + database in order to insure consistency. Thus, in order to find the 1320 + address of the host for a mail exchange, you map the mail domain name to 1321 + a host name, then the host name to addresses, rather than a direct 1322 + mapping to host address. This approach is preferred because it avoids 1323 + the opportunity for inconsistency. 1324 + 1325 + In defining a new type of data, multiple RR types should not be used to 1326 + create an ordering between entries or express different formats for 1327 + equivalent bindings, instead this information should be carried in the 1328 + body of the RR and a single type used. This policy avoids problems with 1329 + caching multiple types and defining QTYPEs to match multiple types. 1330 + 1331 + For example, the original form of mail exchange binding used two RR 1332 + types one to represent a "closer" exchange (MD) and one to represent a 1333 + "less close" exchange (MF). The difficulty is that the presence of one 1334 + RR type in a cache doesn't convey any information about the other 1335 + because the query which acquired the cached information might have used 1336 + a QTYPE of MF, MD, or MAILA (which matched both). The redesigned 1337 + 1338 + 1339 + 1340 + Mockapetris [Page 24] 1341 + 1342 + RFC 1035 Domain Implementation and Specification November 1987 1343 + 1344 + 1345 + service used a single type (MX) with a "preference" value in the RDATA 1346 + section which can order different RRs. However, if any MX RRs are found 1347 + in the cache, then all should be there. 1348 + 1349 + 4. MESSAGES 1350 + 1351 + 4.1. Format 1352 + 1353 + All communications inside of the domain protocol are carried in a single 1354 + format called a message. The top level format of message is divided 1355 + into 5 sections (some of which are empty in certain cases) shown below: 1356 + 1357 + +---------------------+ 1358 + | Header | 1359 + +---------------------+ 1360 + | Question | the question for the name server 1361 + +---------------------+ 1362 + | Answer | RRs answering the question 1363 + +---------------------+ 1364 + | Authority | RRs pointing toward an authority 1365 + +---------------------+ 1366 + | Additional | RRs holding additional information 1367 + +---------------------+ 1368 + 1369 + The header section is always present. The header includes fields that 1370 + specify which of the remaining sections are present, and also specify 1371 + whether the message is a query or a response, a standard query or some 1372 + other opcode, etc. 1373 + 1374 + The names of the sections after the header are derived from their use in 1375 + standard queries. The question section contains fields that describe a 1376 + question to a name server. These fields are a query type (QTYPE), a 1377 + query class (QCLASS), and a query domain name (QNAME). The last three 1378 + sections have the same format: a possibly empty list of concatenated 1379 + resource records (RRs). The answer section contains RRs that answer the 1380 + question; the authority section contains RRs that point toward an 1381 + authoritative name server; the additional records section contains RRs 1382 + which relate to the query, but are not strictly answers for the 1383 + question. 1384 + 1385 + 1386 + 1387 + 1388 + 1389 + 1390 + 1391 + 1392 + 1393 + 1394 + 1395 + 1396 + Mockapetris [Page 25] 1397 + 1398 + RFC 1035 Domain Implementation and Specification November 1987 1399 + 1400 + 1401 + 4.1.1. Header section format 1402 + 1403 + The header contains the following fields: 1404 + 1405 + 1 1 1 1 1 1 1406 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 1407 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1408 + | ID | 1409 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1410 + |QR| Opcode |AA|TC|RD|RA| Z | RCODE | 1411 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1412 + | QDCOUNT | 1413 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1414 + | ANCOUNT | 1415 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1416 + | NSCOUNT | 1417 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1418 + | ARCOUNT | 1419 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1420 + 1421 + where: 1422 + 1423 + ID A 16 bit identifier assigned by the program that 1424 + generates any kind of query. This identifier is copied 1425 + the corresponding reply and can be used by the requester 1426 + to match up replies to outstanding queries. 1427 + 1428 + QR A one bit field that specifies whether this message is a 1429 + query (0), or a response (1). 1430 + 1431 + OPCODE A four bit field that specifies kind of query in this 1432 + message. This value is set by the originator of a query 1433 + and copied into the response. The values are: 1434 + 1435 + 0 a standard query (QUERY) 1436 + 1437 + 1 an inverse query (IQUERY) 1438 + 1439 + 2 a server status request (STATUS) 1440 + 1441 + 3-15 reserved for future use 1442 + 1443 + AA Authoritative Answer - this bit is valid in responses, 1444 + and specifies that the responding name server is an 1445 + authority for the domain name in question section. 1446 + 1447 + Note that the contents of the answer section may have 1448 + multiple owner names because of aliases. The AA bit 1449 + 1450 + 1451 + 1452 + Mockapetris [Page 26] 1453 + 1454 + RFC 1035 Domain Implementation and Specification November 1987 1455 + 1456 + 1457 + corresponds to the name which matches the query name, or 1458 + the first owner name in the answer section. 1459 + 1460 + TC TrunCation - specifies that this message was truncated 1461 + due to length greater than that permitted on the 1462 + transmission channel. 1463 + 1464 + RD Recursion Desired - this bit may be set in a query and 1465 + is copied into the response. If RD is set, it directs 1466 + the name server to pursue the query recursively. 1467 + Recursive query support is optional. 1468 + 1469 + RA Recursion Available - this be is set or cleared in a 1470 + response, and denotes whether recursive query support is 1471 + available in the name server. 1472 + 1473 + Z Reserved for future use. Must be zero in all queries 1474 + and responses. 1475 + 1476 + RCODE Response code - this 4 bit field is set as part of 1477 + responses. The values have the following 1478 + interpretation: 1479 + 1480 + 0 No error condition 1481 + 1482 + 1 Format error - The name server was 1483 + unable to interpret the query. 1484 + 1485 + 2 Server failure - The name server was 1486 + unable to process this query due to a 1487 + problem with the name server. 1488 + 1489 + 3 Name Error - Meaningful only for 1490 + responses from an authoritative name 1491 + server, this code signifies that the 1492 + domain name referenced in the query does 1493 + not exist. 1494 + 1495 + 4 Not Implemented - The name server does 1496 + not support the requested kind of query. 1497 + 1498 + 5 Refused - The name server refuses to 1499 + perform the specified operation for 1500 + policy reasons. For example, a name 1501 + server may not wish to provide the 1502 + information to the particular requester, 1503 + or a name server may not wish to perform 1504 + a particular operation (e.g., zone 1505 + 1506 + 1507 + 1508 + Mockapetris [Page 27] 1509 + 1510 + RFC 1035 Domain Implementation and Specification November 1987 1511 + 1512 + 1513 + transfer) for particular data. 1514 + 1515 + 6-15 Reserved for future use. 1516 + 1517 + QDCOUNT an unsigned 16 bit integer specifying the number of 1518 + entries in the question section. 1519 + 1520 + ANCOUNT an unsigned 16 bit integer specifying the number of 1521 + resource records in the answer section. 1522 + 1523 + NSCOUNT an unsigned 16 bit integer specifying the number of name 1524 + server resource records in the authority records 1525 + section. 1526 + 1527 + ARCOUNT an unsigned 16 bit integer specifying the number of 1528 + resource records in the additional records section. 1529 + 1530 + 4.1.2. Question section format 1531 + 1532 + The question section is used to carry the "question" in most queries, 1533 + i.e., the parameters that define what is being asked. The section 1534 + contains QDCOUNT (usually 1) entries, each of the following format: 1535 + 1536 + 1 1 1 1 1 1 1537 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 1538 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1539 + | | 1540 + / QNAME / 1541 + / / 1542 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1543 + | QTYPE | 1544 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1545 + | QCLASS | 1546 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1547 + 1548 + where: 1549 + 1550 + QNAME a domain name represented as a sequence of labels, where 1551 + each label consists of a length octet followed by that 1552 + number of octets. The domain name terminates with the 1553 + zero length octet for the null label of the root. Note 1554 + that this field may be an odd number of octets; no 1555 + padding is used. 1556 + 1557 + QTYPE a two octet code which specifies the type of the query. 1558 + The values for this field include all codes valid for a 1559 + TYPE field, together with some more general codes which 1560 + can match more than one type of RR. 1561 + 1562 + 1563 + 1564 + Mockapetris [Page 28] 1565 + 1566 + RFC 1035 Domain Implementation and Specification November 1987 1567 + 1568 + 1569 + QCLASS a two octet code that specifies the class of the query. 1570 + For example, the QCLASS field is IN for the Internet. 1571 + 1572 + 4.1.3. Resource record format 1573 + 1574 + The answer, authority, and additional sections all share the same 1575 + format: a variable number of resource records, where the number of 1576 + records is specified in the corresponding count field in the header. 1577 + Each resource record has the following format: 1578 + 1 1 1 1 1 1 1579 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 1580 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1581 + | | 1582 + / / 1583 + / NAME / 1584 + | | 1585 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1586 + | TYPE | 1587 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1588 + | CLASS | 1589 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1590 + | TTL | 1591 + | | 1592 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1593 + | RDLENGTH | 1594 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--| 1595 + / RDATA / 1596 + / / 1597 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1598 + 1599 + where: 1600 + 1601 + NAME a domain name to which this resource record pertains. 1602 + 1603 + TYPE two octets containing one of the RR type codes. This 1604 + field specifies the meaning of the data in the RDATA 1605 + field. 1606 + 1607 + CLASS two octets which specify the class of the data in the 1608 + RDATA field. 1609 + 1610 + TTL a 32 bit unsigned integer that specifies the time 1611 + interval (in seconds) that the resource record may be 1612 + cached before it should be discarded. Zero values are 1613 + interpreted to mean that the RR can only be used for the 1614 + transaction in progress, and should not be cached. 1615 + 1616 + 1617 + 1618 + 1619 + 1620 + Mockapetris [Page 29] 1621 + 1622 + RFC 1035 Domain Implementation and Specification November 1987 1623 + 1624 + 1625 + RDLENGTH an unsigned 16 bit integer that specifies the length in 1626 + octets of the RDATA field. 1627 + 1628 + RDATA a variable length string of octets that describes the 1629 + resource. The format of this information varies 1630 + according to the TYPE and CLASS of the resource record. 1631 + For example, the if the TYPE is A and the CLASS is IN, 1632 + the RDATA field is a 4 octet ARPA Internet address. 1633 + 1634 + 4.1.4. Message compression 1635 + 1636 + In order to reduce the size of messages, the domain system utilizes a 1637 + compression scheme which eliminates the repetition of domain names in a 1638 + message. In this scheme, an entire domain name or a list of labels at 1639 + the end of a domain name is replaced with a pointer to a prior occurance 1640 + of the same name. 1641 + 1642 + The pointer takes the form of a two octet sequence: 1643 + 1644 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1645 + | 1 1| OFFSET | 1646 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1647 + 1648 + The first two bits are ones. This allows a pointer to be distinguished 1649 + from a label, since the label must begin with two zero bits because 1650 + labels are restricted to 63 octets or less. (The 10 and 01 combinations 1651 + are reserved for future use.) The OFFSET field specifies an offset from 1652 + the start of the message (i.e., the first octet of the ID field in the 1653 + domain header). A zero offset specifies the first byte of the ID field, 1654 + etc. 1655 + 1656 + The compression scheme allows a domain name in a message to be 1657 + represented as either: 1658 + 1659 + - a sequence of labels ending in a zero octet 1660 + 1661 + - a pointer 1662 + 1663 + - a sequence of labels ending with a pointer 1664 + 1665 + Pointers can only be used for occurances of a domain name where the 1666 + format is not class specific. If this were not the case, a name server 1667 + or resolver would be required to know the format of all RRs it handled. 1668 + As yet, there are no such cases, but they may occur in future RDATA 1669 + formats. 1670 + 1671 + If a domain name is contained in a part of the message subject to a 1672 + length field (such as the RDATA section of an RR), and compression is 1673 + 1674 + 1675 + 1676 + Mockapetris [Page 30] 1677 + 1678 + RFC 1035 Domain Implementation and Specification November 1987 1679 + 1680 + 1681 + used, the length of the compressed name is used in the length 1682 + calculation, rather than the length of the expanded name. 1683 + 1684 + Programs are free to avoid using pointers in messages they generate, 1685 + although this will reduce datagram capacity, and may cause truncation. 1686 + However all programs are required to understand arriving messages that 1687 + contain pointers. 1688 + 1689 + For example, a datagram might need to use the domain names F.ISI.ARPA, 1690 + FOO.F.ISI.ARPA, ARPA, and the root. Ignoring the other fields of the 1691 + message, these domain names might be represented as: 1692 + 1693 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1694 + 20 | 1 | F | 1695 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1696 + 22 | 3 | I | 1697 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1698 + 24 | S | I | 1699 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1700 + 26 | 4 | A | 1701 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1702 + 28 | R | P | 1703 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1704 + 30 | A | 0 | 1705 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1706 + 1707 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1708 + 40 | 3 | F | 1709 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1710 + 42 | O | O | 1711 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1712 + 44 | 1 1| 20 | 1713 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1714 + 1715 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1716 + 64 | 1 1| 26 | 1717 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1718 + 1719 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1720 + 92 | 0 | | 1721 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1722 + 1723 + The domain name for F.ISI.ARPA is shown at offset 20. The domain name 1724 + FOO.F.ISI.ARPA is shown at offset 40; this definition uses a pointer to 1725 + concatenate a label for FOO to the previously defined F.ISI.ARPA. The 1726 + domain name ARPA is defined at offset 64 using a pointer to the ARPA 1727 + component of the name F.ISI.ARPA at 20; note that this pointer relies on 1728 + ARPA being the last label in the string at 20. The root domain name is 1729 + 1730 + 1731 + 1732 + Mockapetris [Page 31] 1733 + 1734 + RFC 1035 Domain Implementation and Specification November 1987 1735 + 1736 + 1737 + defined by a single octet of zeros at 92; the root domain name has no 1738 + labels. 1739 + 1740 + 4.2. Transport 1741 + 1742 + The DNS assumes that messages will be transmitted as datagrams or in a 1743 + byte stream carried by a virtual circuit. While virtual circuits can be 1744 + used for any DNS activity, datagrams are preferred for queries due to 1745 + their lower overhead and better performance. Zone refresh activities 1746 + must use virtual circuits because of the need for reliable transfer. 1747 + 1748 + The Internet supports name server access using TCP [RFC-793] on server 1749 + port 53 (decimal) as well as datagram access using UDP [RFC-768] on UDP 1750 + port 53 (decimal). 1751 + 1752 + 4.2.1. UDP usage 1753 + 1754 + Messages sent using UDP user server port 53 (decimal). 1755 + 1756 + Messages carried by UDP are restricted to 512 bytes (not counting the IP 1757 + or UDP headers). Longer messages are truncated and the TC bit is set in 1758 + the header. 1759 + 1760 + UDP is not acceptable for zone transfers, but is the recommended method 1761 + for standard queries in the Internet. Queries sent using UDP may be 1762 + lost, and hence a retransmission strategy is required. Queries or their 1763 + responses may be reordered by the network, or by processing in name 1764 + servers, so resolvers should not depend on them being returned in order. 1765 + 1766 + The optimal UDP retransmission policy will vary with performance of the 1767 + Internet and the needs of the client, but the following are recommended: 1768 + 1769 + - The client should try other servers and server addresses 1770 + before repeating a query to a specific address of a server. 1771 + 1772 + - The retransmission interval should be based on prior 1773 + statistics if possible. Too aggressive retransmission can 1774 + easily slow responses for the community at large. Depending 1775 + on how well connected the client is to its expected servers, 1776 + the minimum retransmission interval should be 2-5 seconds. 1777 + 1778 + More suggestions on server selection and retransmission policy can be 1779 + found in the resolver section of this memo. 1780 + 1781 + 4.2.2. TCP usage 1782 + 1783 + Messages sent over TCP connections use server port 53 (decimal). The 1784 + message is prefixed with a two byte length field which gives the message 1785 + 1786 + 1787 + 1788 + Mockapetris [Page 32] 1789 + 1790 + RFC 1035 Domain Implementation and Specification November 1987 1791 + 1792 + 1793 + length, excluding the two byte length field. This length field allows 1794 + the low-level processing to assemble a complete message before beginning 1795 + to parse it. 1796 + 1797 + Several connection management policies are recommended: 1798 + 1799 + - The server should not block other activities waiting for TCP 1800 + data. 1801 + 1802 + - The server should support multiple connections. 1803 + 1804 + - The server should assume that the client will initiate 1805 + connection closing, and should delay closing its end of the 1806 + connection until all outstanding client requests have been 1807 + satisfied. 1808 + 1809 + - If the server needs to close a dormant connection to reclaim 1810 + resources, it should wait until the connection has been idle 1811 + for a period on the order of two minutes. In particular, the 1812 + server should allow the SOA and AXFR request sequence (which 1813 + begins a refresh operation) to be made on a single connection. 1814 + Since the server would be unable to answer queries anyway, a 1815 + unilateral close or reset may be used instead of a graceful 1816 + close. 1817 + 1818 + 5. MASTER FILES 1819 + 1820 + Master files are text files that contain RRs in text form. Since the 1821 + contents of a zone can be expressed in the form of a list of RRs a 1822 + master file is most often used to define a zone, though it can be used 1823 + to list a cache's contents. Hence, this section first discusses the 1824 + format of RRs in a master file, and then the special considerations when 1825 + a master file is used to create a zone in some name server. 1826 + 1827 + 5.1. Format 1828 + 1829 + The format of these files is a sequence of entries. Entries are 1830 + predominantly line-oriented, though parentheses can be used to continue 1831 + a list of items across a line boundary, and text literals can contain 1832 + CRLF within the text. Any combination of tabs and spaces act as a 1833 + delimiter between the separate items that make up an entry. The end of 1834 + any line in the master file can end with a comment. The comment starts 1835 + with a ";" (semicolon). 1836 + 1837 + The following entries are defined: 1838 + 1839 + <blank>[<comment>] 1840 + 1841 + 1842 + 1843 + 1844 + Mockapetris [Page 33] 1845 + 1846 + RFC 1035 Domain Implementation and Specification November 1987 1847 + 1848 + 1849 + $ORIGIN <domain-name> [<comment>] 1850 + 1851 + $INCLUDE <file-name> [<domain-name>] [<comment>] 1852 + 1853 + <domain-name><rr> [<comment>] 1854 + 1855 + <blank><rr> [<comment>] 1856 + 1857 + Blank lines, with or without comments, are allowed anywhere in the file. 1858 + 1859 + Two control entries are defined: $ORIGIN and $INCLUDE. $ORIGIN is 1860 + followed by a domain name, and resets the current origin for relative 1861 + domain names to the stated name. $INCLUDE inserts the named file into 1862 + the current file, and may optionally specify a domain name that sets the 1863 + relative domain name origin for the included file. $INCLUDE may also 1864 + have a comment. Note that a $INCLUDE entry never changes the relative 1865 + origin of the parent file, regardless of changes to the relative origin 1866 + made within the included file. 1867 + 1868 + The last two forms represent RRs. If an entry for an RR begins with a 1869 + blank, then the RR is assumed to be owned by the last stated owner. If 1870 + an RR entry begins with a <domain-name>, then the owner name is reset. 1871 + 1872 + <rr> contents take one of the following forms: 1873 + 1874 + [<TTL>] [<class>] <type> <RDATA> 1875 + 1876 + [<class>] [<TTL>] <type> <RDATA> 1877 + 1878 + The RR begins with optional TTL and class fields, followed by a type and 1879 + RDATA field appropriate to the type and class. Class and type use the 1880 + standard mnemonics, TTL is a decimal integer. Omitted class and TTL 1881 + values are default to the last explicitly stated values. Since type and 1882 + class mnemonics are disjoint, the parse is unique. (Note that this 1883 + order is different from the order used in examples and the order used in 1884 + the actual RRs; the given order allows easier parsing and defaulting.) 1885 + 1886 + <domain-name>s make up a large share of the data in the master file. 1887 + The labels in the domain name are expressed as character strings and 1888 + separated by dots. Quoting conventions allow arbitrary characters to be 1889 + stored in domain names. Domain names that end in a dot are called 1890 + absolute, and are taken as complete. Domain names which do not end in a 1891 + dot are called relative; the actual domain name is the concatenation of 1892 + the relative part with an origin specified in a $ORIGIN, $INCLUDE, or as 1893 + an argument to the master file loading routine. A relative name is an 1894 + error when no origin is available. 1895 + 1896 + 1897 + 1898 + 1899 + 1900 + Mockapetris [Page 34] 1901 + 1902 + RFC 1035 Domain Implementation and Specification November 1987 1903 + 1904 + 1905 + <character-string> is expressed in one or two ways: as a contiguous set 1906 + of characters without interior spaces, or as a string beginning with a " 1907 + and ending with a ". Inside a " delimited string any character can 1908 + occur, except for a " itself, which must be quoted using \ (back slash). 1909 + 1910 + Because these files are text files several special encodings are 1911 + necessary to allow arbitrary data to be loaded. In particular: 1912 + 1913 + of the root. 1914 + 1915 + @ A free standing @ is used to denote the current origin. 1916 + 1917 + \X where X is any character other than a digit (0-9), is 1918 + used to quote that character so that its special meaning 1919 + does not apply. For example, "\." can be used to place 1920 + a dot character in a label. 1921 + 1922 + \DDD where each D is a digit is the octet corresponding to 1923 + the decimal number described by DDD. The resulting 1924 + octet is assumed to be text and is not checked for 1925 + special meaning. 1926 + 1927 + ( ) Parentheses are used to group data that crosses a line 1928 + boundary. In effect, line terminations are not 1929 + recognized within parentheses. 1930 + 1931 + ; Semicolon is used to start a comment; the remainder of 1932 + the line is ignored. 1933 + 1934 + 5.2. Use of master files to define zones 1935 + 1936 + When a master file is used to load a zone, the operation should be 1937 + suppressed if any errors are encountered in the master file. The 1938 + rationale for this is that a single error can have widespread 1939 + consequences. For example, suppose that the RRs defining a delegation 1940 + have syntax errors; then the server will return authoritative name 1941 + errors for all names in the subzone (except in the case where the 1942 + subzone is also present on the server). 1943 + 1944 + Several other validity checks that should be performed in addition to 1945 + insuring that the file is syntactically correct: 1946 + 1947 + 1. All RRs in the file should have the same class. 1948 + 1949 + 2. Exactly one SOA RR should be present at the top of the zone. 1950 + 1951 + 3. If delegations are present and glue information is required, 1952 + it should be present. 1953 + 1954 + 1955 + 1956 + Mockapetris [Page 35] 1957 + 1958 + RFC 1035 Domain Implementation and Specification November 1987 1959 + 1960 + 1961 + 4. Information present outside of the authoritative nodes in the 1962 + zone should be glue information, rather than the result of an 1963 + origin or similar error. 1964 + 1965 + 5.3. Master file example 1966 + 1967 + The following is an example file which might be used to define the 1968 + ISI.EDU zone.and is loaded with an origin of ISI.EDU: 1969 + 1970 + @ IN SOA VENERA Action\.domains ( 1971 + 20 ; SERIAL 1972 + 7200 ; REFRESH 1973 + 600 ; RETRY 1974 + 3600000; EXPIRE 1975 + 60) ; MINIMUM 1976 + 1977 + NS A.ISI.EDU. 1978 + NS VENERA 1979 + NS VAXA 1980 + MX 10 VENERA 1981 + MX 20 VAXA 1982 + 1983 + A A 26.3.0.103 1984 + 1985 + VENERA A 10.1.0.52 1986 + A 128.9.0.32 1987 + 1988 + VAXA A 10.2.0.27 1989 + A 128.9.0.33 1990 + 1991 + 1992 + $INCLUDE <SUBSYS>ISI-MAILBOXES.TXT 1993 + 1994 + Where the file <SUBSYS>ISI-MAILBOXES.TXT is: 1995 + 1996 + MOE MB A.ISI.EDU. 1997 + LARRY MB A.ISI.EDU. 1998 + CURLEY MB A.ISI.EDU. 1999 + STOOGES MG MOE 2000 + MG LARRY 2001 + MG CURLEY 2002 + 2003 + Note the use of the \ character in the SOA RR to specify the responsible 2004 + person mailbox "Action.domains@E.ISI.EDU". 2005 + 2006 + 2007 + 2008 + 2009 + 2010 + 2011 + 2012 + Mockapetris [Page 36] 2013 + 2014 + RFC 1035 Domain Implementation and Specification November 1987 2015 + 2016 + 2017 + 6. NAME SERVER IMPLEMENTATION 2018 + 2019 + 6.1. Architecture 2020 + 2021 + The optimal structure for the name server will depend on the host 2022 + operating system and whether the name server is integrated with resolver 2023 + operations, either by supporting recursive service, or by sharing its 2024 + database with a resolver. This section discusses implementation 2025 + considerations for a name server which shares a database with a 2026 + resolver, but most of these concerns are present in any name server. 2027 + 2028 + 6.1.1. Control 2029 + 2030 + A name server must employ multiple concurrent activities, whether they 2031 + are implemented as separate tasks in the host's OS or multiplexing 2032 + inside a single name server program. It is simply not acceptable for a 2033 + name server to block the service of UDP requests while it waits for TCP 2034 + data for refreshing or query activities. Similarly, a name server 2035 + should not attempt to provide recursive service without processing such 2036 + requests in parallel, though it may choose to serialize requests from a 2037 + single client, or to regard identical requests from the same client as 2038 + duplicates. A name server should not substantially delay requests while 2039 + it reloads a zone from master files or while it incorporates a newly 2040 + refreshed zone into its database. 2041 + 2042 + 6.1.2. Database 2043 + 2044 + While name server implementations are free to use any internal data 2045 + structures they choose, the suggested structure consists of three major 2046 + parts: 2047 + 2048 + - A "catalog" data structure which lists the zones available to 2049 + this server, and a "pointer" to the zone data structure. The 2050 + main purpose of this structure is to find the nearest ancestor 2051 + zone, if any, for arriving standard queries. 2052 + 2053 + - Separate data structures for each of the zones held by the 2054 + name server. 2055 + 2056 + - A data structure for cached data. (or perhaps separate caches 2057 + for different classes) 2058 + 2059 + All of these data structures can be implemented an identical tree 2060 + structure format, with different data chained off the nodes in different 2061 + parts: in the catalog the data is pointers to zones, while in the zone 2062 + and cache data structures, the data will be RRs. In designing the tree 2063 + framework the designer should recognize that query processing will need 2064 + to traverse the tree using case-insensitive label comparisons; and that 2065 + 2066 + 2067 + 2068 + Mockapetris [Page 37] 2069 + 2070 + RFC 1035 Domain Implementation and Specification November 1987 2071 + 2072 + 2073 + in real data, a few nodes have a very high branching factor (100-1000 or 2074 + more), but the vast majority have a very low branching factor (0-1). 2075 + 2076 + One way to solve the case problem is to store the labels for each node 2077 + in two pieces: a standardized-case representation of the label where all 2078 + ASCII characters are in a single case, together with a bit mask that 2079 + denotes which characters are actually of a different case. The 2080 + branching factor diversity can be handled using a simple linked list for 2081 + a node until the branching factor exceeds some threshold, and 2082 + transitioning to a hash structure after the threshold is exceeded. In 2083 + any case, hash structures used to store tree sections must insure that 2084 + hash functions and procedures preserve the casing conventions of the 2085 + DNS. 2086 + 2087 + The use of separate structures for the different parts of the database 2088 + is motivated by several factors: 2089 + 2090 + - The catalog structure can be an almost static structure that 2091 + need change only when the system administrator changes the 2092 + zones supported by the server. This structure can also be 2093 + used to store parameters used to control refreshing 2094 + activities. 2095 + 2096 + - The individual data structures for zones allow a zone to be 2097 + replaced simply by changing a pointer in the catalog. Zone 2098 + refresh operations can build a new structure and, when 2099 + complete, splice it into the database via a simple pointer 2100 + replacement. It is very important that when a zone is 2101 + refreshed, queries should not use old and new data 2102 + simultaneously. 2103 + 2104 + - With the proper search procedures, authoritative data in zones 2105 + will always "hide", and hence take precedence over, cached 2106 + data. 2107 + 2108 + - Errors in zone definitions that cause overlapping zones, etc., 2109 + may cause erroneous responses to queries, but problem 2110 + determination is simplified, and the contents of one "bad" 2111 + zone can't corrupt another. 2112 + 2113 + - Since the cache is most frequently updated, it is most 2114 + vulnerable to corruption during system restarts. It can also 2115 + become full of expired RR data. In either case, it can easily 2116 + be discarded without disturbing zone data. 2117 + 2118 + A major aspect of database design is selecting a structure which allows 2119 + the name server to deal with crashes of the name server's host. State 2120 + information which a name server should save across system crashes 2121 + 2122 + 2123 + 2124 + Mockapetris [Page 38] 2125 + 2126 + RFC 1035 Domain Implementation and Specification November 1987 2127 + 2128 + 2129 + includes the catalog structure (including the state of refreshing for 2130 + each zone) and the zone data itself. 2131 + 2132 + 6.1.3. Time 2133 + 2134 + Both the TTL data for RRs and the timing data for refreshing activities 2135 + depends on 32 bit timers in units of seconds. Inside the database, 2136 + refresh timers and TTLs for cached data conceptually "count down", while 2137 + data in the zone stays with constant TTLs. 2138 + 2139 + A recommended implementation strategy is to store time in two ways: as 2140 + a relative increment and as an absolute time. One way to do this is to 2141 + use positive 32 bit numbers for one type and negative numbers for the 2142 + other. The RRs in zones use relative times; the refresh timers and 2143 + cache data use absolute times. Absolute numbers are taken with respect 2144 + to some known origin and converted to relative values when placed in the 2145 + response to a query. When an absolute TTL is negative after conversion 2146 + to relative, then the data is expired and should be ignored. 2147 + 2148 + 6.2. Standard query processing 2149 + 2150 + The major algorithm for standard query processing is presented in 2151 + [RFC-1034]. 2152 + 2153 + When processing queries with QCLASS=*, or some other QCLASS which 2154 + matches multiple classes, the response should never be authoritative 2155 + unless the server can guarantee that the response covers all classes. 2156 + 2157 + When composing a response, RRs which are to be inserted in the 2158 + additional section, but duplicate RRs in the answer or authority 2159 + sections, may be omitted from the additional section. 2160 + 2161 + When a response is so long that truncation is required, the truncation 2162 + should start at the end of the response and work forward in the 2163 + datagram. Thus if there is any data for the authority section, the 2164 + answer section is guaranteed to be unique. 2165 + 2166 + The MINIMUM value in the SOA should be used to set a floor on the TTL of 2167 + data distributed from a zone. This floor function should be done when 2168 + the data is copied into a response. This will allow future dynamic 2169 + update protocols to change the SOA MINIMUM field without ambiguous 2170 + semantics. 2171 + 2172 + 6.3. Zone refresh and reload processing 2173 + 2174 + In spite of a server's best efforts, it may be unable to load zone data 2175 + from a master file due to syntax errors, etc., or be unable to refresh a 2176 + zone within the its expiration parameter. In this case, the name server 2177 + 2178 + 2179 + 2180 + Mockapetris [Page 39] 2181 + 2182 + RFC 1035 Domain Implementation and Specification November 1987 2183 + 2184 + 2185 + should answer queries as if it were not supposed to possess the zone. 2186 + 2187 + If a master is sending a zone out via AXFR, and a new version is created 2188 + during the transfer, the master should continue to send the old version 2189 + if possible. In any case, it should never send part of one version and 2190 + part of another. If completion is not possible, the master should reset 2191 + the connection on which the zone transfer is taking place. 2192 + 2193 + 6.4. Inverse queries (Optional) 2194 + 2195 + Inverse queries are an optional part of the DNS. Name servers are not 2196 + required to support any form of inverse queries. If a name server 2197 + receives an inverse query that it does not support, it returns an error 2198 + response with the "Not Implemented" error set in the header. While 2199 + inverse query support is optional, all name servers must be at least 2200 + able to return the error response. 2201 + 2202 + 6.4.1. The contents of inverse queries and responses Inverse 2203 + queries reverse the mappings performed by standard query operations; 2204 + while a standard query maps a domain name to a resource, an inverse 2205 + query maps a resource to a domain name. For example, a standard query 2206 + might bind a domain name to a host address; the corresponding inverse 2207 + query binds the host address to a domain name. 2208 + 2209 + Inverse queries take the form of a single RR in the answer section of 2210 + the message, with an empty question section. The owner name of the 2211 + query RR and its TTL are not significant. The response carries 2212 + questions in the question section which identify all names possessing 2213 + the query RR WHICH THE NAME SERVER KNOWS. Since no name server knows 2214 + about all of the domain name space, the response can never be assumed to 2215 + be complete. Thus inverse queries are primarily useful for database 2216 + management and debugging activities. Inverse queries are NOT an 2217 + acceptable method of mapping host addresses to host names; use the IN- 2218 + ADDR.ARPA domain instead. 2219 + 2220 + Where possible, name servers should provide case-insensitive comparisons 2221 + for inverse queries. Thus an inverse query asking for an MX RR of 2222 + "Venera.isi.edu" should get the same response as a query for 2223 + "VENERA.ISI.EDU"; an inverse query for HINFO RR "IBM-PC UNIX" should 2224 + produce the same result as an inverse query for "IBM-pc unix". However, 2225 + this cannot be guaranteed because name servers may possess RRs that 2226 + contain character strings but the name server does not know that the 2227 + data is character. 2228 + 2229 + When a name server processes an inverse query, it either returns: 2230 + 2231 + 1. zero, one, or multiple domain names for the specified 2232 + resource as QNAMEs in the question section 2233 + 2234 + 2235 + 2236 + Mockapetris [Page 40] 2237 + 2238 + RFC 1035 Domain Implementation and Specification November 1987 2239 + 2240 + 2241 + 2. an error code indicating that the name server doesn't support 2242 + inverse mapping of the specified resource type. 2243 + 2244 + When the response to an inverse query contains one or more QNAMEs, the 2245 + owner name and TTL of the RR in the answer section which defines the 2246 + inverse query is modified to exactly match an RR found at the first 2247 + QNAME. 2248 + 2249 + RRs returned in the inverse queries cannot be cached using the same 2250 + mechanism as is used for the replies to standard queries. One reason 2251 + for this is that a name might have multiple RRs of the same type, and 2252 + only one would appear. For example, an inverse query for a single 2253 + address of a multiply homed host might create the impression that only 2254 + one address existed. 2255 + 2256 + 6.4.2. Inverse query and response example The overall structure 2257 + of an inverse query for retrieving the domain name that corresponds to 2258 + Internet address 10.1.0.52 is shown below: 2259 + 2260 + +-----------------------------------------+ 2261 + Header | OPCODE=IQUERY, ID=997 | 2262 + +-----------------------------------------+ 2263 + Question | <empty> | 2264 + +-----------------------------------------+ 2265 + Answer | <anyname> A IN 10.1.0.52 | 2266 + +-----------------------------------------+ 2267 + Authority | <empty> | 2268 + +-----------------------------------------+ 2269 + Additional | <empty> | 2270 + +-----------------------------------------+ 2271 + 2272 + This query asks for a question whose answer is the Internet style 2273 + address 10.1.0.52. Since the owner name is not known, any domain name 2274 + can be used as a placeholder (and is ignored). A single octet of zero, 2275 + signifying the root, is usually used because it minimizes the length of 2276 + the message. The TTL of the RR is not significant. The response to 2277 + this query might be: 2278 + 2279 + 2280 + 2281 + 2282 + 2283 + 2284 + 2285 + 2286 + 2287 + 2288 + 2289 + 2290 + 2291 + 2292 + Mockapetris [Page 41] 2293 + 2294 + RFC 1035 Domain Implementation and Specification November 1987 2295 + 2296 + 2297 + +-----------------------------------------+ 2298 + Header | OPCODE=RESPONSE, ID=997 | 2299 + +-----------------------------------------+ 2300 + Question |QTYPE=A, QCLASS=IN, QNAME=VENERA.ISI.EDU | 2301 + +-----------------------------------------+ 2302 + Answer | VENERA.ISI.EDU A IN 10.1.0.52 | 2303 + +-----------------------------------------+ 2304 + Authority | <empty> | 2305 + +-----------------------------------------+ 2306 + Additional | <empty> | 2307 + +-----------------------------------------+ 2308 + 2309 + Note that the QTYPE in a response to an inverse query is the same as the 2310 + TYPE field in the answer section of the inverse query. Responses to 2311 + inverse queries may contain multiple questions when the inverse is not 2312 + unique. If the question section in the response is not empty, then the 2313 + RR in the answer section is modified to correspond to be an exact copy 2314 + of an RR at the first QNAME. 2315 + 2316 + 6.4.3. Inverse query processing 2317 + 2318 + Name servers that support inverse queries can support these operations 2319 + through exhaustive searches of their databases, but this becomes 2320 + impractical as the size of the database increases. An alternative 2321 + approach is to invert the database according to the search key. 2322 + 2323 + For name servers that support multiple zones and a large amount of data, 2324 + the recommended approach is separate inversions for each zone. When a 2325 + particular zone is changed during a refresh, only its inversions need to 2326 + be redone. 2327 + 2328 + Support for transfer of this type of inversion may be included in future 2329 + versions of the domain system, but is not supported in this version. 2330 + 2331 + 6.5. Completion queries and responses 2332 + 2333 + The optional completion services described in RFC-882 and RFC-883 have 2334 + been deleted. Redesigned services may become available in the future. 2335 + 2336 + 2337 + 2338 + 2339 + 2340 + 2341 + 2342 + 2343 + 2344 + 2345 + 2346 + 2347 + 2348 + Mockapetris [Page 42] 2349 + 2350 + RFC 1035 Domain Implementation and Specification November 1987 2351 + 2352 + 2353 + 7. RESOLVER IMPLEMENTATION 2354 + 2355 + The top levels of the recommended resolver algorithm are discussed in 2356 + [RFC-1034]. This section discusses implementation details assuming the 2357 + database structure suggested in the name server implementation section 2358 + of this memo. 2359 + 2360 + 7.1. Transforming a user request into a query 2361 + 2362 + The first step a resolver takes is to transform the client's request, 2363 + stated in a format suitable to the local OS, into a search specification 2364 + for RRs at a specific name which match a specific QTYPE and QCLASS. 2365 + Where possible, the QTYPE and QCLASS should correspond to a single type 2366 + and a single class, because this makes the use of cached data much 2367 + simpler. The reason for this is that the presence of data of one type 2368 + in a cache doesn't confirm the existence or non-existence of data of 2369 + other types, hence the only way to be sure is to consult an 2370 + authoritative source. If QCLASS=* is used, then authoritative answers 2371 + won't be available. 2372 + 2373 + Since a resolver must be able to multiplex multiple requests if it is to 2374 + perform its function efficiently, each pending request is usually 2375 + represented in some block of state information. This state block will 2376 + typically contain: 2377 + 2378 + - A timestamp indicating the time the request began. 2379 + The timestamp is used to decide whether RRs in the database 2380 + can be used or are out of date. This timestamp uses the 2381 + absolute time format previously discussed for RR storage in 2382 + zones and caches. Note that when an RRs TTL indicates a 2383 + relative time, the RR must be timely, since it is part of a 2384 + zone. When the RR has an absolute time, it is part of a 2385 + cache, and the TTL of the RR is compared against the timestamp 2386 + for the start of the request. 2387 + 2388 + Note that using the timestamp is superior to using a current 2389 + time, since it allows RRs with TTLs of zero to be entered in 2390 + the cache in the usual manner, but still used by the current 2391 + request, even after intervals of many seconds due to system 2392 + load, query retransmission timeouts, etc. 2393 + 2394 + - Some sort of parameters to limit the amount of work which will 2395 + be performed for this request. 2396 + 2397 + The amount of work which a resolver will do in response to a 2398 + client request must be limited to guard against errors in the 2399 + database, such as circular CNAME references, and operational 2400 + problems, such as network partition which prevents the 2401 + 2402 + 2403 + 2404 + Mockapetris [Page 43] 2405 + 2406 + RFC 1035 Domain Implementation and Specification November 1987 2407 + 2408 + 2409 + resolver from accessing the name servers it needs. While 2410 + local limits on the number of times a resolver will retransmit 2411 + a particular query to a particular name server address are 2412 + essential, the resolver should have a global per-request 2413 + counter to limit work on a single request. The counter should 2414 + be set to some initial value and decremented whenever the 2415 + resolver performs any action (retransmission timeout, 2416 + retransmission, etc.) If the counter passes zero, the request 2417 + is terminated with a temporary error. 2418 + 2419 + Note that if the resolver structure allows one request to 2420 + start others in parallel, such as when the need to access a 2421 + name server for one request causes a parallel resolve for the 2422 + name server's addresses, the spawned request should be started 2423 + with a lower counter. This prevents circular references in 2424 + the database from starting a chain reaction of resolver 2425 + activity. 2426 + 2427 + - The SLIST data structure discussed in [RFC-1034]. 2428 + 2429 + This structure keeps track of the state of a request if it 2430 + must wait for answers from foreign name servers. 2431 + 2432 + 7.2. Sending the queries 2433 + 2434 + As described in [RFC-1034], the basic task of the resolver is to 2435 + formulate a query which will answer the client's request and direct that 2436 + query to name servers which can provide the information. The resolver 2437 + will usually only have very strong hints about which servers to ask, in 2438 + the form of NS RRs, and may have to revise the query, in response to 2439 + CNAMEs, or revise the set of name servers the resolver is asking, in 2440 + response to delegation responses which point the resolver to name 2441 + servers closer to the desired information. In addition to the 2442 + information requested by the client, the resolver may have to call upon 2443 + its own services to determine the address of name servers it wishes to 2444 + contact. 2445 + 2446 + In any case, the model used in this memo assumes that the resolver is 2447 + multiplexing attention between multiple requests, some from the client, 2448 + and some internally generated. Each request is represented by some 2449 + state information, and the desired behavior is that the resolver 2450 + transmit queries to name servers in a way that maximizes the probability 2451 + that the request is answered, minimizes the time that the request takes, 2452 + and avoids excessive transmissions. The key algorithm uses the state 2453 + information of the request to select the next name server address to 2454 + query, and also computes a timeout which will cause the next action 2455 + should a response not arrive. The next action will usually be a 2456 + transmission to some other server, but may be a temporary error to the 2457 + 2458 + 2459 + 2460 + Mockapetris [Page 44] 2461 + 2462 + RFC 1035 Domain Implementation and Specification November 1987 2463 + 2464 + 2465 + client. 2466 + 2467 + The resolver always starts with a list of server names to query (SLIST). 2468 + This list will be all NS RRs which correspond to the nearest ancestor 2469 + zone that the resolver knows about. To avoid startup problems, the 2470 + resolver should have a set of default servers which it will ask should 2471 + it have no current NS RRs which are appropriate. The resolver then adds 2472 + to SLIST all of the known addresses for the name servers, and may start 2473 + parallel requests to acquire the addresses of the servers when the 2474 + resolver has the name, but no addresses, for the name servers. 2475 + 2476 + To complete initialization of SLIST, the resolver attaches whatever 2477 + history information it has to the each address in SLIST. This will 2478 + usually consist of some sort of weighted averages for the response time 2479 + of the address, and the batting average of the address (i.e., how often 2480 + the address responded at all to the request). Note that this 2481 + information should be kept on a per address basis, rather than on a per 2482 + name server basis, because the response time and batting average of a 2483 + particular server may vary considerably from address to address. Note 2484 + also that this information is actually specific to a resolver address / 2485 + server address pair, so a resolver with multiple addresses may wish to 2486 + keep separate histories for each of its addresses. Part of this step 2487 + must deal with addresses which have no such history; in this case an 2488 + expected round trip time of 5-10 seconds should be the worst case, with 2489 + lower estimates for the same local network, etc. 2490 + 2491 + Note that whenever a delegation is followed, the resolver algorithm 2492 + reinitializes SLIST. 2493 + 2494 + The information establishes a partial ranking of the available name 2495 + server addresses. Each time an address is chosen and the state should 2496 + be altered to prevent its selection again until all other addresses have 2497 + been tried. The timeout for each transmission should be 50-100% greater 2498 + than the average predicted value to allow for variance in response. 2499 + 2500 + Some fine points: 2501 + 2502 + - The resolver may encounter a situation where no addresses are 2503 + available for any of the name servers named in SLIST, and 2504 + where the servers in the list are precisely those which would 2505 + normally be used to look up their own addresses. This 2506 + situation typically occurs when the glue address RRs have a 2507 + smaller TTL than the NS RRs marking delegation, or when the 2508 + resolver caches the result of a NS search. The resolver 2509 + should detect this condition and restart the search at the 2510 + next ancestor zone, or alternatively at the root. 2511 + 2512 + 2513 + 2514 + 2515 + 2516 + Mockapetris [Page 45] 2517 + 2518 + RFC 1035 Domain Implementation and Specification November 1987 2519 + 2520 + 2521 + - If a resolver gets a server error or other bizarre response 2522 + from a name server, it should remove it from SLIST, and may 2523 + wish to schedule an immediate transmission to the next 2524 + candidate server address. 2525 + 2526 + 7.3. Processing responses 2527 + 2528 + The first step in processing arriving response datagrams is to parse the 2529 + response. This procedure should include: 2530 + 2531 + - Check the header for reasonableness. Discard datagrams which 2532 + are queries when responses are expected. 2533 + 2534 + - Parse the sections of the message, and insure that all RRs are 2535 + correctly formatted. 2536 + 2537 + - As an optional step, check the TTLs of arriving data looking 2538 + for RRs with excessively long TTLs. If a RR has an 2539 + excessively long TTL, say greater than 1 week, either discard 2540 + the whole response, or limit all TTLs in the response to 1 2541 + week. 2542 + 2543 + The next step is to match the response to a current resolver request. 2544 + The recommended strategy is to do a preliminary matching using the ID 2545 + field in the domain header, and then to verify that the question section 2546 + corresponds to the information currently desired. This requires that 2547 + the transmission algorithm devote several bits of the domain ID field to 2548 + a request identifier of some sort. This step has several fine points: 2549 + 2550 + - Some name servers send their responses from different 2551 + addresses than the one used to receive the query. That is, a 2552 + resolver cannot rely that a response will come from the same 2553 + address which it sent the corresponding query to. This name 2554 + server bug is typically encountered in UNIX systems. 2555 + 2556 + - If the resolver retransmits a particular request to a name 2557 + server it should be able to use a response from any of the 2558 + transmissions. However, if it is using the response to sample 2559 + the round trip time to access the name server, it must be able 2560 + to determine which transmission matches the response (and keep 2561 + transmission times for each outgoing message), or only 2562 + calculate round trip times based on initial transmissions. 2563 + 2564 + - A name server will occasionally not have a current copy of a 2565 + zone which it should have according to some NS RRs. The 2566 + resolver should simply remove the name server from the current 2567 + SLIST, and continue. 2568 + 2569 + 2570 + 2571 + 2572 + Mockapetris [Page 46] 2573 + 2574 + RFC 1035 Domain Implementation and Specification November 1987 2575 + 2576 + 2577 + 7.4. Using the cache 2578 + 2579 + In general, we expect a resolver to cache all data which it receives in 2580 + responses since it may be useful in answering future client requests. 2581 + However, there are several types of data which should not be cached: 2582 + 2583 + - When several RRs of the same type are available for a 2584 + particular owner name, the resolver should either cache them 2585 + all or none at all. When a response is truncated, and a 2586 + resolver doesn't know whether it has a complete set, it should 2587 + not cache a possibly partial set of RRs. 2588 + 2589 + - Cached data should never be used in preference to 2590 + authoritative data, so if caching would cause this to happen 2591 + the data should not be cached. 2592 + 2593 + - The results of an inverse query should not be cached. 2594 + 2595 + - The results of standard queries where the QNAME contains "*" 2596 + labels if the data might be used to construct wildcards. The 2597 + reason is that the cache does not necessarily contain existing 2598 + RRs or zone boundary information which is necessary to 2599 + restrict the application of the wildcard RRs. 2600 + 2601 + - RR data in responses of dubious reliability. When a resolver 2602 + receives unsolicited responses or RR data other than that 2603 + requested, it should discard it without caching it. The basic 2604 + implication is that all sanity checks on a packet should be 2605 + performed before any of it is cached. 2606 + 2607 + In a similar vein, when a resolver has a set of RRs for some name in a 2608 + response, and wants to cache the RRs, it should check its cache for 2609 + already existing RRs. Depending on the circumstances, either the data 2610 + in the response or the cache is preferred, but the two should never be 2611 + combined. If the data in the response is from authoritative data in the 2612 + answer section, it is always preferred. 2613 + 2614 + 8. MAIL SUPPORT 2615 + 2616 + The domain system defines a standard for mapping mailboxes into domain 2617 + names, and two methods for using the mailbox information to derive mail 2618 + routing information. The first method is called mail exchange binding 2619 + and the other method is mailbox binding. The mailbox encoding standard 2620 + and mail exchange binding are part of the DNS official protocol, and are 2621 + the recommended method for mail routing in the Internet. Mailbox 2622 + binding is an experimental feature which is still under development and 2623 + subject to change. 2624 + 2625 + 2626 + 2627 + 2628 + Mockapetris [Page 47] 2629 + 2630 + RFC 1035 Domain Implementation and Specification November 1987 2631 + 2632 + 2633 + The mailbox encoding standard assumes a mailbox name of the form 2634 + "<local-part>@<mail-domain>". While the syntax allowed in each of these 2635 + sections varies substantially between the various mail internets, the 2636 + preferred syntax for the ARPA Internet is given in [RFC-822]. 2637 + 2638 + The DNS encodes the <local-part> as a single label, and encodes the 2639 + <mail-domain> as a domain name. The single label from the <local-part> 2640 + is prefaced to the domain name from <mail-domain> to form the domain 2641 + name corresponding to the mailbox. Thus the mailbox HOSTMASTER@SRI- 2642 + NIC.ARPA is mapped into the domain name HOSTMASTER.SRI-NIC.ARPA. If the 2643 + <local-part> contains dots or other special characters, its 2644 + representation in a master file will require the use of backslash 2645 + quoting to ensure that the domain name is properly encoded. For 2646 + example, the mailbox Action.domains@ISI.EDU would be represented as 2647 + Action\.domains.ISI.EDU. 2648 + 2649 + 8.1. Mail exchange binding 2650 + 2651 + Mail exchange binding uses the <mail-domain> part of a mailbox 2652 + specification to determine where mail should be sent. The <local-part> 2653 + is not even consulted. [RFC-974] specifies this method in detail, and 2654 + should be consulted before attempting to use mail exchange support. 2655 + 2656 + One of the advantages of this method is that it decouples mail 2657 + destination naming from the hosts used to support mail service, at the 2658 + cost of another layer of indirection in the lookup function. However, 2659 + the addition layer should eliminate the need for complicated "%", "!", 2660 + etc encodings in <local-part>. 2661 + 2662 + The essence of the method is that the <mail-domain> is used as a domain 2663 + name to locate type MX RRs which list hosts willing to accept mail for 2664 + <mail-domain>, together with preference values which rank the hosts 2665 + according to an order specified by the administrators for <mail-domain>. 2666 + 2667 + In this memo, the <mail-domain> ISI.EDU is used in examples, together 2668 + with the hosts VENERA.ISI.EDU and VAXA.ISI.EDU as mail exchanges for 2669 + ISI.EDU. If a mailer had a message for Mockapetris@ISI.EDU, it would 2670 + route it by looking up MX RRs for ISI.EDU. The MX RRs at ISI.EDU name 2671 + VENERA.ISI.EDU and VAXA.ISI.EDU, and type A queries can find the host 2672 + addresses. 2673 + 2674 + 8.2. Mailbox binding (Experimental) 2675 + 2676 + In mailbox binding, the mailer uses the entire mail destination 2677 + specification to construct a domain name. The encoded domain name for 2678 + the mailbox is used as the QNAME field in a QTYPE=MAILB query. 2679 + 2680 + Several outcomes are possible for this query: 2681 + 2682 + 2683 + 2684 + Mockapetris [Page 48] 2685 + 2686 + RFC 1035 Domain Implementation and Specification November 1987 2687 + 2688 + 2689 + 1. The query can return a name error indicating that the mailbox 2690 + does not exist as a domain name. 2691 + 2692 + In the long term, this would indicate that the specified 2693 + mailbox doesn't exist. However, until the use of mailbox 2694 + binding is universal, this error condition should be 2695 + interpreted to mean that the organization identified by the 2696 + global part does not support mailbox binding. The 2697 + appropriate procedure is to revert to exchange binding at 2698 + this point. 2699 + 2700 + 2. The query can return a Mail Rename (MR) RR. 2701 + 2702 + The MR RR carries new mailbox specification in its RDATA 2703 + field. The mailer should replace the old mailbox with the 2704 + new one and retry the operation. 2705 + 2706 + 3. The query can return a MB RR. 2707 + 2708 + The MB RR carries a domain name for a host in its RDATA 2709 + field. The mailer should deliver the message to that host 2710 + via whatever protocol is applicable, e.g., b,SMTP. 2711 + 2712 + 4. The query can return one or more Mail Group (MG) RRs. 2713 + 2714 + This condition means that the mailbox was actually a mailing 2715 + list or mail group, rather than a single mailbox. Each MG RR 2716 + has a RDATA field that identifies a mailbox that is a member 2717 + of the group. The mailer should deliver a copy of the 2718 + message to each member. 2719 + 2720 + 5. The query can return a MB RR as well as one or more MG RRs. 2721 + 2722 + This condition means the the mailbox was actually a mailing 2723 + list. The mailer can either deliver the message to the host 2724 + specified by the MB RR, which will in turn do the delivery to 2725 + all members, or the mailer can use the MG RRs to do the 2726 + expansion itself. 2727 + 2728 + In any of these cases, the response may include a Mail Information 2729 + (MINFO) RR. This RR is usually associated with a mail group, but is 2730 + legal with a MB. The MINFO RR identifies two mailboxes. One of these 2731 + identifies a responsible person for the original mailbox name. This 2732 + mailbox should be used for requests to be added to a mail group, etc. 2733 + The second mailbox name in the MINFO RR identifies a mailbox that should 2734 + receive error messages for mail failures. This is particularly 2735 + appropriate for mailing lists when errors in member names should be 2736 + reported to a person other than the one who sends a message to the list. 2737 + 2738 + 2739 + 2740 + Mockapetris [Page 49] 2741 + 2742 + RFC 1035 Domain Implementation and Specification November 1987 2743 + 2744 + 2745 + New fields may be added to this RR in the future. 2746 + 2747 + 2748 + 9. REFERENCES and BIBLIOGRAPHY 2749 + 2750 + [Dyer 87] S. Dyer, F. Hsu, "Hesiod", Project Athena 2751 + Technical Plan - Name Service, April 1987, version 1.9. 2752 + 2753 + Describes the fundamentals of the Hesiod name service. 2754 + 2755 + [IEN-116] J. Postel, "Internet Name Server", IEN-116, 2756 + USC/Information Sciences Institute, August 1979. 2757 + 2758 + A name service obsoleted by the Domain Name System, but 2759 + still in use. 2760 + 2761 + [Quarterman 86] J. Quarterman, and J. Hoskins, "Notable Computer Networks", 2762 + Communications of the ACM, October 1986, volume 29, number 2763 + 10. 2764 + 2765 + [RFC-742] K. Harrenstien, "NAME/FINGER", RFC-742, Network 2766 + Information Center, SRI International, December 1977. 2767 + 2768 + [RFC-768] J. Postel, "User Datagram Protocol", RFC-768, 2769 + USC/Information Sciences Institute, August 1980. 2770 + 2771 + [RFC-793] J. Postel, "Transmission Control Protocol", RFC-793, 2772 + USC/Information Sciences Institute, September 1981. 2773 + 2774 + [RFC-799] D. Mills, "Internet Name Domains", RFC-799, COMSAT, 2775 + September 1981. 2776 + 2777 + Suggests introduction of a hierarchy in place of a flat 2778 + name space for the Internet. 2779 + 2780 + [RFC-805] J. Postel, "Computer Mail Meeting Notes", RFC-805, 2781 + USC/Information Sciences Institute, February 1982. 2782 + 2783 + [RFC-810] E. Feinler, K. Harrenstien, Z. Su, and V. White, "DOD 2784 + Internet Host Table Specification", RFC-810, Network 2785 + Information Center, SRI International, March 1982. 2786 + 2787 + Obsolete. See RFC-952. 2788 + 2789 + [RFC-811] K. Harrenstien, V. White, and E. Feinler, "Hostnames 2790 + Server", RFC-811, Network Information Center, SRI 2791 + International, March 1982. 2792 + 2793 + 2794 + 2795 + 2796 + Mockapetris [Page 50] 2797 + 2798 + RFC 1035 Domain Implementation and Specification November 1987 2799 + 2800 + 2801 + Obsolete. See RFC-953. 2802 + 2803 + [RFC-812] K. Harrenstien, and V. White, "NICNAME/WHOIS", RFC-812, 2804 + Network Information Center, SRI International, March 2805 + 1982. 2806 + 2807 + [RFC-819] Z. Su, and J. Postel, "The Domain Naming Convention for 2808 + Internet User Applications", RFC-819, Network 2809 + Information Center, SRI International, August 1982. 2810 + 2811 + Early thoughts on the design of the domain system. 2812 + Current implementation is completely different. 2813 + 2814 + [RFC-821] J. Postel, "Simple Mail Transfer Protocol", RFC-821, 2815 + USC/Information Sciences Institute, August 1980. 2816 + 2817 + [RFC-830] Z. Su, "A Distributed System for Internet Name Service", 2818 + RFC-830, Network Information Center, SRI International, 2819 + October 1982. 2820 + 2821 + Early thoughts on the design of the domain system. 2822 + Current implementation is completely different. 2823 + 2824 + [RFC-882] P. Mockapetris, "Domain names - Concepts and 2825 + Facilities," RFC-882, USC/Information Sciences 2826 + Institute, November 1983. 2827 + 2828 + Superceeded by this memo. 2829 + 2830 + [RFC-883] P. Mockapetris, "Domain names - Implementation and 2831 + Specification," RFC-883, USC/Information Sciences 2832 + Institute, November 1983. 2833 + 2834 + Superceeded by this memo. 2835 + 2836 + [RFC-920] J. Postel and J. Reynolds, "Domain Requirements", 2837 + RFC-920, USC/Information Sciences Institute, 2838 + October 1984. 2839 + 2840 + Explains the naming scheme for top level domains. 2841 + 2842 + [RFC-952] K. Harrenstien, M. Stahl, E. Feinler, "DoD Internet Host 2843 + Table Specification", RFC-952, SRI, October 1985. 2844 + 2845 + Specifies the format of HOSTS.TXT, the host/address 2846 + table replaced by the DNS. 2847 + 2848 + 2849 + 2850 + 2851 + 2852 + Mockapetris [Page 51] 2853 + 2854 + RFC 1035 Domain Implementation and Specification November 1987 2855 + 2856 + 2857 + [RFC-953] K. Harrenstien, M. Stahl, E. Feinler, "HOSTNAME Server", 2858 + RFC-953, SRI, October 1985. 2859 + 2860 + This RFC contains the official specification of the 2861 + hostname server protocol, which is obsoleted by the DNS. 2862 + This TCP based protocol accesses information stored in 2863 + the RFC-952 format, and is used to obtain copies of the 2864 + host table. 2865 + 2866 + [RFC-973] P. Mockapetris, "Domain System Changes and 2867 + Observations", RFC-973, USC/Information Sciences 2868 + Institute, January 1986. 2869 + 2870 + Describes changes to RFC-882 and RFC-883 and reasons for 2871 + them. 2872 + 2873 + [RFC-974] C. Partridge, "Mail routing and the domain system", 2874 + RFC-974, CSNET CIC BBN Labs, January 1986. 2875 + 2876 + Describes the transition from HOSTS.TXT based mail 2877 + addressing to the more powerful MX system used with the 2878 + domain system. 2879 + 2880 + [RFC-1001] NetBIOS Working Group, "Protocol standard for a NetBIOS 2881 + service on a TCP/UDP transport: Concepts and Methods", 2882 + RFC-1001, March 1987. 2883 + 2884 + This RFC and RFC-1002 are a preliminary design for 2885 + NETBIOS on top of TCP/IP which proposes to base NetBIOS 2886 + name service on top of the DNS. 2887 + 2888 + [RFC-1002] NetBIOS Working Group, "Protocol standard for a NetBIOS 2889 + service on a TCP/UDP transport: Detailed 2890 + Specifications", RFC-1002, March 1987. 2891 + 2892 + [RFC-1010] J. Reynolds, and J. Postel, "Assigned Numbers", RFC-1010, 2893 + USC/Information Sciences Institute, May 1987. 2894 + 2895 + Contains socket numbers and mnemonics for host names, 2896 + operating systems, etc. 2897 + 2898 + [RFC-1031] W. Lazear, "MILNET Name Domain Transition", RFC-1031, 2899 + November 1987. 2900 + 2901 + Describes a plan for converting the MILNET to the DNS. 2902 + 2903 + [RFC-1032] M. Stahl, "Establishing a Domain - Guidelines for 2904 + Administrators", RFC-1032, November 1987. 2905 + 2906 + 2907 + 2908 + Mockapetris [Page 52] 2909 + 2910 + RFC 1035 Domain Implementation and Specification November 1987 2911 + 2912 + 2913 + Describes the registration policies used by the NIC to 2914 + administer the top level domains and delegate subzones. 2915 + 2916 + [RFC-1033] M. Lottor, "Domain Administrators Operations Guide", 2917 + RFC-1033, November 1987. 2918 + 2919 + A cookbook for domain administrators. 2920 + 2921 + [Solomon 82] M. Solomon, L. Landweber, and D. Neuhengen, "The CSNET 2922 + Name Server", Computer Networks, vol 6, nr 3, July 1982. 2923 + 2924 + Describes a name service for CSNET which is independent 2925 + from the DNS and DNS use in the CSNET. 2926 + 2927 + 2928 + 2929 + 2930 + 2931 + 2932 + 2933 + 2934 + 2935 + 2936 + 2937 + 2938 + 2939 + 2940 + 2941 + 2942 + 2943 + 2944 + 2945 + 2946 + 2947 + 2948 + 2949 + 2950 + 2951 + 2952 + 2953 + 2954 + 2955 + 2956 + 2957 + 2958 + 2959 + 2960 + 2961 + 2962 + 2963 + 2964 + Mockapetris [Page 53] 2965 + 2966 + RFC 1035 Domain Implementation and Specification November 1987 2967 + 2968 + 2969 + Index 2970 + 2971 + * 13 2972 + 2973 + ; 33, 35 2974 + 2975 + <character-string> 35 2976 + <domain-name> 34 2977 + 2978 + @ 35 2979 + 2980 + \ 35 2981 + 2982 + A 12 2983 + 2984 + Byte order 8 2985 + 2986 + CH 13 2987 + Character case 9 2988 + CLASS 11 2989 + CNAME 12 2990 + Completion 42 2991 + CS 13 2992 + 2993 + Hesiod 13 2994 + HINFO 12 2995 + HS 13 2996 + 2997 + IN 13 2998 + IN-ADDR.ARPA domain 22 2999 + Inverse queries 40 3000 + 3001 + Mailbox names 47 3002 + MB 12 3003 + MD 12 3004 + MF 12 3005 + MG 12 3006 + MINFO 12 3007 + MINIMUM 20 3008 + MR 12 3009 + MX 12 3010 + 3011 + NS 12 3012 + NULL 12 3013 + 3014 + Port numbers 32 3015 + Primary server 5 3016 + PTR 12, 18 3017 + 3018 + 3019 + 3020 + Mockapetris [Page 54] 3021 + 3022 + RFC 1035 Domain Implementation and Specification November 1987 3023 + 3024 + 3025 + QCLASS 13 3026 + QTYPE 12 3027 + 3028 + RDATA 12 3029 + RDLENGTH 11 3030 + 3031 + Secondary server 5 3032 + SOA 12 3033 + Stub resolvers 7 3034 + 3035 + TCP 32 3036 + TXT 12 3037 + TYPE 11 3038 + 3039 + UDP 32 3040 + 3041 + WKS 12 3042 + 3043 + 3044 + 3045 + 3046 + 3047 + 3048 + 3049 + 3050 + 3051 + 3052 + 3053 + 3054 + 3055 + 3056 + 3057 + 3058 + 3059 + 3060 + 3061 + 3062 + 3063 + 3064 + 3065 + 3066 + 3067 + 3068 + 3069 + 3070 + 3071 + 3072 + 3073 + 3074 + 3075 + 3076 + Mockapetris [Page 55] 3077 +
+1963
spec/rfc3492.txt
··· 1 + 2 + 3 + 4 + 5 + 6 + 7 + Network Working Group A. Costello 8 + Request for Comments: 3492 Univ. of California, Berkeley 9 + Category: Standards Track March 2003 10 + 11 + 12 + Punycode: A Bootstring encoding of Unicode 13 + for Internationalized Domain Names in Applications (IDNA) 14 + 15 + Status of this Memo 16 + 17 + This document specifies an Internet standards track protocol for the 18 + Internet community, and requests discussion and suggestions for 19 + improvements. Please refer to the current edition of the "Internet 20 + Official Protocol Standards" (STD 1) for the standardization state 21 + and status of this protocol. Distribution of this memo is unlimited. 22 + 23 + Copyright Notice 24 + 25 + Copyright (C) The Internet Society (2003). All Rights Reserved. 26 + 27 + Abstract 28 + 29 + Punycode is a simple and efficient transfer encoding syntax designed 30 + for use with Internationalized Domain Names in Applications (IDNA). 31 + It uniquely and reversibly transforms a Unicode string into an ASCII 32 + string. ASCII characters in the Unicode string are represented 33 + literally, and non-ASCII characters are represented by ASCII 34 + characters that are allowed in host name labels (letters, digits, and 35 + hyphens). This document defines a general algorithm called 36 + Bootstring that allows a string of basic code points to uniquely 37 + represent any string of code points drawn from a larger set. 38 + Punycode is an instance of Bootstring that uses particular parameter 39 + values specified by this document, appropriate for IDNA. 40 + 41 + Table of Contents 42 + 43 + 1. Introduction...............................................2 44 + 1.1 Features..............................................2 45 + 1.2 Interaction of protocol parts.........................3 46 + 2. Terminology................................................3 47 + 3. Bootstring description.....................................4 48 + 3.1 Basic code point segregation..........................4 49 + 3.2 Insertion unsort coding...............................4 50 + 3.3 Generalized variable-length integers..................5 51 + 3.4 Bias adaptation.......................................7 52 + 4. Bootstring parameters......................................8 53 + 5. Parameter values for Punycode..............................8 54 + 6. Bootstring algorithms......................................9 55 + 56 + 57 + 58 + Costello Standards Track [Page 1] 59 + 60 + RFC 3492 IDNA Punycode March 2003 61 + 62 + 63 + 6.1 Bias adaptation function.............................10 64 + 6.2 Decoding procedure...................................11 65 + 6.3 Encoding procedure...................................12 66 + 6.4 Overflow handling....................................13 67 + 7. Punycode examples.........................................14 68 + 7.1 Sample strings.......................................14 69 + 7.2 Decoding traces......................................17 70 + 7.3 Encoding traces......................................19 71 + 8. Security Considerations...................................20 72 + 9. References................................................21 73 + 9.1 Normative References.................................21 74 + 9.2 Informative References...............................21 75 + A. Mixed-case annotation.....................................22 76 + B. Disclaimer and license....................................22 77 + C. Punycode sample implementation............................23 78 + Author's Address.............................................34 79 + Full Copyright Statement.....................................35 80 + 81 + 1. Introduction 82 + 83 + [IDNA] describes an architecture for supporting internationalized 84 + domain names. Labels containing non-ASCII characters can be 85 + represented by ACE labels, which begin with a special ACE prefix and 86 + contain only ASCII characters. The remainder of the label after the 87 + prefix is a Punycode encoding of a Unicode string satisfying certain 88 + constraints. For the details of the prefix and constraints, see 89 + [IDNA] and [NAMEPREP]. 90 + 91 + Punycode is an instance of a more general algorithm called 92 + Bootstring, which allows strings composed from a small set of "basic" 93 + code points to uniquely represent any string of code points drawn 94 + from a larger set. Punycode is Bootstring with particular parameter 95 + values appropriate for IDNA. 96 + 97 + 1.1 Features 98 + 99 + Bootstring has been designed to have the following features: 100 + 101 + * Completeness: Every extended string (sequence of arbitrary code 102 + points) can be represented by a basic string (sequence of basic 103 + code points). Restrictions on what strings are allowed, and on 104 + length, can be imposed by higher layers. 105 + 106 + * Uniqueness: There is at most one basic string that represents a 107 + given extended string. 108 + 109 + * Reversibility: Any extended string mapped to a basic string can 110 + be recovered from that basic string. 111 + 112 + 113 + 114 + Costello Standards Track [Page 2] 115 + 116 + RFC 3492 IDNA Punycode March 2003 117 + 118 + 119 + * Efficient encoding: The ratio of basic string length to extended 120 + string length is small. This is important in the context of 121 + domain names because RFC 1034 [RFC1034] restricts the length of a 122 + domain label to 63 characters. 123 + 124 + * Simplicity: The encoding and decoding algorithms are reasonably 125 + simple to implement. The goals of efficiency and simplicity are 126 + at odds; Bootstring aims at a good balance between them. 127 + 128 + * Readability: Basic code points appearing in the extended string 129 + are represented as themselves in the basic string (although the 130 + main purpose is to improve efficiency, not readability). 131 + 132 + Punycode can also support an additional feature that is not used by 133 + the ToASCII and ToUnicode operations of [IDNA]. When extended 134 + strings are case-folded prior to encoding, the basic string can use 135 + mixed case to tell how to convert the folded string into a mixed-case 136 + string. See appendix A "Mixed-case annotation". 137 + 138 + 1.2 Interaction of protocol parts 139 + 140 + Punycode is used by the IDNA protocol [IDNA] for converting domain 141 + labels into ASCII; it is not designed for any other purpose. It is 142 + explicitly not designed for processing arbitrary free text. 143 + 144 + 2. Terminology 145 + 146 + The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 147 + "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 148 + document are to be interpreted as described in BCP 14, RFC 2119 149 + [RFC2119]. 150 + 151 + A code point is an integral value associated with a character in a 152 + coded character set. 153 + 154 + As in the Unicode Standard [UNICODE], Unicode code points are denoted 155 + by "U+" followed by four to six hexadecimal digits, while a range of 156 + code points is denoted by two hexadecimal numbers separated by "..", 157 + with no prefixes. 158 + 159 + The operators div and mod perform integer division; (x div y) is the 160 + quotient of x divided by y, discarding the remainder, and (x mod y) 161 + is the remainder, so (x div y) * y + (x mod y) == x. Bootstring uses 162 + these operators only with nonnegative operands, so the quotient and 163 + remainder are always nonnegative. 164 + 165 + The break statement jumps out of the innermost loop (as in C). 166 + 167 + 168 + 169 + 170 + Costello Standards Track [Page 3] 171 + 172 + RFC 3492 IDNA Punycode March 2003 173 + 174 + 175 + An overflow is an attempt to compute a value that exceeds the maximum 176 + value of an integer variable. 177 + 178 + 3. Bootstring description 179 + 180 + Bootstring represents an arbitrary sequence of code points (the 181 + "extended string") as a sequence of basic code points (the "basic 182 + string"). This section describes the representation. Section 6 183 + "Bootstring algorithms" presents the algorithms as pseudocode. 184 + Sections 7.1 "Decoding traces" and 7.2 "Encoding traces" trace the 185 + algorithms for sample inputs. 186 + 187 + The following sections describe the four techniques used in 188 + Bootstring. "Basic code point segregation" is a very simple and 189 + efficient encoding for basic code points occurring in the extended 190 + string: they are simply copied all at once. "Insertion unsort 191 + coding" encodes the non-basic code points as deltas, and processes 192 + the code points in numerical order rather than in order of 193 + appearance, which typically results in smaller deltas. The deltas 194 + are represented as "generalized variable-length integers", which use 195 + basic code points to represent nonnegative integers. The parameters 196 + of this integer representation are dynamically adjusted using "bias 197 + adaptation", to improve efficiency when consecutive deltas have 198 + similar magnitudes. 199 + 200 + 3.1 Basic code point segregation 201 + 202 + All basic code points appearing in the extended string are 203 + represented literally at the beginning of the basic string, in their 204 + original order, followed by a delimiter if (and only if) the number 205 + of basic code points is nonzero. The delimiter is a particular basic 206 + code point, which never appears in the remainder of the basic string. 207 + The decoder can therefore find the end of the literal portion (if 208 + there is one) by scanning for the last delimiter. 209 + 210 + 3.2 Insertion unsort coding 211 + 212 + The remainder of the basic string (after the last delimiter if there 213 + is one) represents a sequence of nonnegative integral deltas as 214 + generalized variable-length integers, described in section 3.3. The 215 + meaning of the deltas is best understood in terms of the decoder. 216 + 217 + The decoder builds the extended string incrementally. Initially, the 218 + extended string is a copy of the literal portion of the basic string 219 + (excluding the last delimiter). The decoder inserts non-basic code 220 + points, one for each delta, into the extended string, ultimately 221 + arriving at the final decoded string. 222 + 223 + 224 + 225 + 226 + Costello Standards Track [Page 4] 227 + 228 + RFC 3492 IDNA Punycode March 2003 229 + 230 + 231 + At the heart of this process is a state machine with two state 232 + variables: an index i and a counter n. The index i refers to a 233 + position in the extended string; it ranges from 0 (the first 234 + position) to the current length of the extended string (which refers 235 + to a potential position beyond the current end). If the current 236 + state is <n,i>, the next state is <n,i+1> if i is less than the 237 + length of the extended string, or <n+1,0> if i equals the length of 238 + the extended string. In other words, each state change causes i to 239 + increment, wrapping around to zero if necessary, and n counts the 240 + number of wrap-arounds. 241 + 242 + Notice that the state always advances monotonically (there is no way 243 + for the decoder to return to an earlier state). At each state, an 244 + insertion is either performed or not performed. At most one 245 + insertion is performed in a given state. An insertion inserts the 246 + value of n at position i in the extended string. The deltas are a 247 + run-length encoding of this sequence of events: they are the lengths 248 + of the runs of non-insertion states preceeding the insertion states. 249 + Hence, for each delta, the decoder performs delta state changes, then 250 + an insertion, and then one more state change. (An implementation 251 + need not perform each state change individually, but can instead use 252 + division and remainder calculations to compute the next insertion 253 + state directly.) It is an error if the inserted code point is a 254 + basic code point (because basic code points were supposed to be 255 + segregated as described in section 3.1). 256 + 257 + The encoder's main task is to derive the sequence of deltas that will 258 + cause the decoder to construct the desired string. It can do this by 259 + repeatedly scanning the extended string for the next code point that 260 + the decoder would need to insert, and counting the number of state 261 + changes the decoder would need to perform, mindful of the fact that 262 + the decoder's extended string will include only those code points 263 + that have already been inserted. Section 6.3 "Encoding procedure" 264 + gives a precise algorithm. 265 + 266 + 3.3 Generalized variable-length integers 267 + 268 + In a conventional integer representation the base is the number of 269 + distinct symbols for digits, whose values are 0 through base-1. Let 270 + digit_0 denote the least significant digit, digit_1 the next least 271 + significant, and so on. The value represented is the sum over j of 272 + digit_j * w(j), where w(j) = base^j is the weight (scale factor) for 273 + position j. For example, in the base 8 integer 437, the digits are 274 + 7, 3, and 4, and the weights are 1, 8, and 64, so the value is 7 + 275 + 3*8 + 4*64 = 287. This representation has two disadvantages: First, 276 + there are multiple encodings of each value (because there can be 277 + extra zeros in the most significant positions), which is inconvenient 278 + 279 + 280 + 281 + 282 + Costello Standards Track [Page 5] 283 + 284 + RFC 3492 IDNA Punycode March 2003 285 + 286 + 287 + when unique encodings are needed. Second, the integer is not self- 288 + delimiting, so if multiple integers are concatenated the boundaries 289 + between them are lost. 290 + 291 + The generalized variable-length representation solves these two 292 + problems. The digit values are still 0 through base-1, but now the 293 + integer is self-delimiting by means of thresholds t(j), each of which 294 + is in the range 0 through base-1. Exactly one digit, the most 295 + significant, satisfies digit_j < t(j). Therefore, if several 296 + integers are concatenated, it is easy to separate them, starting with 297 + the first if they are little-endian (least significant digit first), 298 + or starting with the last if they are big-endian (most significant 299 + digit first). As before, the value is the sum over j of digit_j * 300 + w(j), but the weights are different: 301 + 302 + w(0) = 1 303 + w(j) = w(j-1) * (base - t(j-1)) for j > 0 304 + 305 + For example, consider the little-endian sequence of base 8 digits 306 + 734251... Suppose the thresholds are 2, 3, 5, 5, 5, 5... This 307 + implies that the weights are 1, 1*(8-2) = 6, 6*(8-3) = 30, 30*(8-5) = 308 + 90, 90*(8-5) = 270, and so on. 7 is not less than 2, and 3 is not 309 + less than 3, but 4 is less than 5, so 4 is the last digit. The value 310 + of 734 is 7*1 + 3*6 + 4*30 = 145. The next integer is 251, with 311 + value 2*1 + 5*6 + 1*30 = 62. Decoding this representation is very 312 + similar to decoding a conventional integer: Start with a current 313 + value of N = 0 and a weight w = 1. Fetch the next digit d and 314 + increase N by d * w. If d is less than the current threshold (t) 315 + then stop, otherwise increase w by a factor of (base - t), update t 316 + for the next position, and repeat. 317 + 318 + Encoding this representation is similar to encoding a conventional 319 + integer: If N < t then output one digit for N and stop, otherwise 320 + output the digit for t + ((N - t) mod (base - t)), then replace N 321 + with (N - t) div (base - t), update t for the next position, and 322 + repeat. 323 + 324 + For any particular set of values of t(j), there is exactly one 325 + generalized variable-length representation of each nonnegative 326 + integral value. 327 + 328 + Bootstring uses little-endian ordering so that the deltas can be 329 + separated starting with the first. The t(j) values are defined in 330 + terms of the constants base, tmin, and tmax, and a state variable 331 + called bias: 332 + 333 + t(j) = base * (j + 1) - bias, 334 + clamped to the range tmin through tmax 335 + 336 + 337 + 338 + Costello Standards Track [Page 6] 339 + 340 + RFC 3492 IDNA Punycode March 2003 341 + 342 + 343 + The clamping means that if the formula yields a value less than tmin 344 + or greater than tmax, then t(j) = tmin or tmax, respectively. (In 345 + the pseudocode in section 6 "Bootstring algorithms", the expression 346 + base * (j + 1) is denoted by k for performance reasons.) These t(j) 347 + values cause the representation to favor integers within a particular 348 + range determined by the bias. 349 + 350 + 3.4 Bias adaptation 351 + 352 + After each delta is encoded or decoded, bias is set for the next 353 + delta as follows: 354 + 355 + 1. Delta is scaled in order to avoid overflow in the next step: 356 + 357 + let delta = delta div 2 358 + 359 + But when this is the very first delta, the divisor is not 2, but 360 + instead a constant called damp. This compensates for the fact 361 + that the second delta is usually much smaller than the first. 362 + 363 + 2. Delta is increased to compensate for the fact that the next delta 364 + will be inserting into a longer string: 365 + 366 + let delta = delta + (delta div numpoints) 367 + 368 + numpoints is the total number of code points encoded/decoded so 369 + far (including the one corresponding to this delta itself, and 370 + including the basic code points). 371 + 372 + 3. Delta is repeatedly divided until it falls within a threshold, to 373 + predict the minimum number of digits needed to represent the next 374 + delta: 375 + 376 + while delta > ((base - tmin) * tmax) div 2 377 + do let delta = delta div (base - tmin) 378 + 379 + 4. The bias is set: 380 + 381 + let bias = 382 + (base * the number of divisions performed in step 3) + 383 + (((base - tmin + 1) * delta) div (delta + skew)) 384 + 385 + The motivation for this procedure is that the current delta 386 + provides a hint about the likely size of the next delta, and so 387 + t(j) is set to tmax for the more significant digits starting with 388 + the one expected to be last, tmin for the less significant digits 389 + up through the one expected to be third-last, and somewhere 390 + between tmin and tmax for the digit expected to be second-last 391 + 392 + 393 + 394 + Costello Standards Track [Page 7] 395 + 396 + RFC 3492 IDNA Punycode March 2003 397 + 398 + 399 + (balancing the hope of the expected-last digit being unnecessary 400 + against the danger of it being insufficient). 401 + 402 + 4. Bootstring parameters 403 + 404 + Given a set of basic code points, one needs to be designated as the 405 + delimiter. The base cannot be greater than the number of 406 + distinguishable basic code points remaining. The digit-values in the 407 + range 0 through base-1 need to be associated with distinct non- 408 + delimiter basic code points. In some cases multiple code points need 409 + to have the same digit-value; for example, uppercase and lowercase 410 + versions of the same letter need to be equivalent if basic strings 411 + are case-insensitive. 412 + 413 + The initial value of n cannot be greater than the minimum non-basic 414 + code point that could appear in extended strings. 415 + 416 + The remaining five parameters (tmin, tmax, skew, damp, and the 417 + initial value of bias) need to satisfy the following constraints: 418 + 419 + 0 <= tmin <= tmax <= base-1 420 + skew >= 1 421 + damp >= 2 422 + initial_bias mod base <= base - tmin 423 + 424 + Provided the constraints are satisfied, these five parameters affect 425 + efficiency but not correctness. They are best chosen empirically. 426 + 427 + If support for mixed-case annotation is desired (see appendix A), 428 + make sure that the code points corresponding to 0 through tmax-1 all 429 + have both uppercase and lowercase forms. 430 + 431 + 5. Parameter values for Punycode 432 + 433 + Punycode uses the following Bootstring parameter values: 434 + 435 + base = 36 436 + tmin = 1 437 + tmax = 26 438 + skew = 38 439 + damp = 700 440 + initial_bias = 72 441 + initial_n = 128 = 0x80 442 + 443 + Although the only restriction Punycode imposes on the input integers 444 + is that they be nonnegative, these parameters are especially designed 445 + to work well with Unicode [UNICODE] code points, which are integers 446 + in the range 0..10FFFF (but not D800..DFFF, which are reserved for 447 + 448 + 449 + 450 + Costello Standards Track [Page 8] 451 + 452 + RFC 3492 IDNA Punycode March 2003 453 + 454 + 455 + use by the UTF-16 encoding of Unicode). The basic code points are 456 + the ASCII [ASCII] code points (0..7F), of which U+002D (-) is the 457 + delimiter, and some of the others have digit-values as follows: 458 + 459 + code points digit-values 460 + ------------ ---------------------- 461 + 41..5A (A-Z) = 0 to 25, respectively 462 + 61..7A (a-z) = 0 to 25, respectively 463 + 30..39 (0-9) = 26 to 35, respectively 464 + 465 + Using hyphen-minus as the delimiter implies that the encoded string 466 + can end with a hyphen-minus only if the Unicode string consists 467 + entirely of basic code points, but IDNA forbids such strings from 468 + being encoded. The encoded string can begin with a hyphen-minus, but 469 + IDNA prepends a prefix. Therefore IDNA using Punycode conforms to 470 + the RFC 952 rule that host name labels neither begin nor end with a 471 + hyphen-minus [RFC952]. 472 + 473 + A decoder MUST recognize the letters in both uppercase and lowercase 474 + forms (including mixtures of both forms). An encoder SHOULD output 475 + only uppercase forms or only lowercase forms, unless it uses mixed- 476 + case annotation (see appendix A). 477 + 478 + Presumably most users will not manually write or type encoded strings 479 + (as opposed to cutting and pasting them), but those who do will need 480 + to be alert to the potential visual ambiguity between the following 481 + sets of characters: 482 + 483 + G 6 484 + I l 1 485 + O 0 486 + S 5 487 + U V 488 + Z 2 489 + 490 + Such ambiguities are usually resolved by context, but in a Punycode 491 + encoded string there is no context apparent to humans. 492 + 493 + 6. Bootstring algorithms 494 + 495 + Some parts of the pseudocode can be omitted if the parameters satisfy 496 + certain conditions (for which Punycode qualifies). These parts are 497 + enclosed in {braces}, and notes immediately following the pseudocode 498 + explain the conditions under which they can be omitted. 499 + 500 + 501 + 502 + 503 + 504 + 505 + 506 + Costello Standards Track [Page 9] 507 + 508 + RFC 3492 IDNA Punycode March 2003 509 + 510 + 511 + Formally, code points are integers, and hence the pseudocode assumes 512 + that arithmetic operations can be performed directly on code points. 513 + In some programming languages, explicit conversion between code 514 + points and integers might be necessary. 515 + 516 + 6.1 Bias adaptation function 517 + 518 + function adapt(delta,numpoints,firsttime): 519 + if firsttime then let delta = delta div damp 520 + else let delta = delta div 2 521 + let delta = delta + (delta div numpoints) 522 + let k = 0 523 + while delta > ((base - tmin) * tmax) div 2 do begin 524 + let delta = delta div (base - tmin) 525 + let k = k + base 526 + end 527 + return k + (((base - tmin + 1) * delta) div (delta + skew)) 528 + 529 + It does not matter whether the modifications to delta and k inside 530 + adapt() affect variables of the same name inside the 531 + encoding/decoding procedures, because after calling adapt() the 532 + caller does not read those variables before overwriting them. 533 + 534 + 535 + 536 + 537 + 538 + 539 + 540 + 541 + 542 + 543 + 544 + 545 + 546 + 547 + 548 + 549 + 550 + 551 + 552 + 553 + 554 + 555 + 556 + 557 + 558 + 559 + 560 + 561 + 562 + Costello Standards Track [Page 10] 563 + 564 + RFC 3492 IDNA Punycode March 2003 565 + 566 + 567 + 6.2 Decoding procedure 568 + 569 + let n = initial_n 570 + let i = 0 571 + let bias = initial_bias 572 + let output = an empty string indexed from 0 573 + consume all code points before the last delimiter (if there is one) 574 + and copy them to output, fail on any non-basic code point 575 + if more than zero code points were consumed then consume one more 576 + (which will be the last delimiter) 577 + while the input is not exhausted do begin 578 + let oldi = i 579 + let w = 1 580 + for k = base to infinity in steps of base do begin 581 + consume a code point, or fail if there was none to consume 582 + let digit = the code point's digit-value, fail if it has none 583 + let i = i + digit * w, fail on overflow 584 + let t = tmin if k <= bias {+ tmin}, or 585 + tmax if k >= bias + tmax, or k - bias otherwise 586 + if digit < t then break 587 + let w = w * (base - t), fail on overflow 588 + end 589 + let bias = adapt(i - oldi, length(output) + 1, test oldi is 0?) 590 + let n = n + i div (length(output) + 1), fail on overflow 591 + let i = i mod (length(output) + 1) 592 + {if n is a basic code point then fail} 593 + insert n into output at position i 594 + increment i 595 + end 596 + 597 + The full statement enclosed in braces (checking whether n is a basic 598 + code point) can be omitted if initial_n exceeds all basic code points 599 + (which is true for Punycode), because n is never less than initial_n. 600 + 601 + In the assignment of t, where t is clamped to the range tmin through 602 + tmax, "+ tmin" can always be omitted. This makes the clamping 603 + calculation incorrect when bias < k < bias + tmin, but that cannot 604 + happen because of the way bias is computed and because of the 605 + constraints on the parameters. 606 + 607 + Because the decoder state can only advance monotonically, and there 608 + is only one representation of any delta, there is therefore only one 609 + encoded string that can represent a given sequence of integers. The 610 + only error conditions are invalid code points, unexpected end-of- 611 + input, overflow, and basic code points encoded using deltas instead 612 + of appearing literally. If the decoder fails on these errors as 613 + shown above, then it cannot produce the same output for two distinct 614 + inputs. Without this property it would have been necessary to re- 615 + 616 + 617 + 618 + Costello Standards Track [Page 11] 619 + 620 + RFC 3492 IDNA Punycode March 2003 621 + 622 + 623 + encode the output and verify that it matches the input in order to 624 + guarantee the uniqueness of the encoding. 625 + 626 + 6.3 Encoding procedure 627 + 628 + let n = initial_n 629 + let delta = 0 630 + let bias = initial_bias 631 + let h = b = the number of basic code points in the input 632 + copy them to the output in order, followed by a delimiter if b > 0 633 + {if the input contains a non-basic code point < n then fail} 634 + while h < length(input) do begin 635 + let m = the minimum {non-basic} code point >= n in the input 636 + let delta = delta + (m - n) * (h + 1), fail on overflow 637 + let n = m 638 + for each code point c in the input (in order) do begin 639 + if c < n {or c is basic} then increment delta, fail on overflow 640 + if c == n then begin 641 + let q = delta 642 + for k = base to infinity in steps of base do begin 643 + let t = tmin if k <= bias {+ tmin}, or 644 + tmax if k >= bias + tmax, or k - bias otherwise 645 + if q < t then break 646 + output the code point for digit t + ((q - t) mod (base - t)) 647 + let q = (q - t) div (base - t) 648 + end 649 + output the code point for digit q 650 + let bias = adapt(delta, h + 1, test h equals b?) 651 + let delta = 0 652 + increment h 653 + end 654 + end 655 + increment delta and n 656 + end 657 + 658 + The full statement enclosed in braces (checking whether the input 659 + contains a non-basic code point less than n) can be omitted if all 660 + code points less than initial_n are basic code points (which is true 661 + for Punycode if code points are unsigned). 662 + 663 + The brace-enclosed conditions "non-basic" and "or c is basic" can be 664 + omitted if initial_n exceeds all basic code points (which is true for 665 + Punycode), because the code point being tested is never less than 666 + initial_n. 667 + 668 + In the assignment of t, where t is clamped to the range tmin through 669 + tmax, "+ tmin" can always be omitted. This makes the clamping 670 + calculation incorrect when bias < k < bias + tmin, but that cannot 671 + 672 + 673 + 674 + Costello Standards Track [Page 12] 675 + 676 + RFC 3492 IDNA Punycode March 2003 677 + 678 + 679 + happen because of the way bias is computed and because of the 680 + constraints on the parameters. 681 + 682 + The checks for overflow are necessary to avoid producing invalid 683 + output when the input contains very large values or is very long. 684 + 685 + The increment of delta at the bottom of the outer loop cannot 686 + overflow because delta < length(input) before the increment, and 687 + length(input) is already assumed to be representable. The increment 688 + of n could overflow, but only if h == length(input), in which case 689 + the procedure is finished anyway. 690 + 691 + 6.4 Overflow handling 692 + 693 + For IDNA, 26-bit unsigned integers are sufficient to handle all valid 694 + IDNA labels without overflow, because any string that needed a 27-bit 695 + delta would have to exceed either the code point limit (0..10FFFF) or 696 + the label length limit (63 characters). However, overflow handling 697 + is necessary because the inputs are not necessarily valid IDNA 698 + labels. 699 + 700 + If the programming language does not provide overflow detection, the 701 + following technique can be used. Suppose A, B, and C are 702 + representable nonnegative integers and C is nonzero. Then A + B 703 + overflows if and only if B > maxint - A, and A + (B * C) overflows if 704 + and only if B > (maxint - A) div C, where maxint is the greatest 705 + integer for which maxint + 1 cannot be represented. Refer to 706 + appendix C "Punycode sample implementation" for demonstrations of 707 + this technique in the C language. 708 + 709 + The decoding and encoding algorithms shown in sections 6.2 and 6.3 710 + handle overflow by detecting it whenever it happens. Another 711 + approach is to enforce limits on the inputs that prevent overflow 712 + from happening. For example, if the encoder were to verify that no 713 + input code points exceed M and that the input length does not exceed 714 + L, then no delta could ever exceed (M - initial_n) * (L + 1), and 715 + hence no overflow could occur if integer variables were capable of 716 + representing values that large. This prevention approach would 717 + impose more restrictions on the input than the detection approach 718 + does, but might be considered simpler in some programming languages. 719 + 720 + In theory, the decoder could use an analogous approach, limiting the 721 + number of digits in a variable-length integer (that is, limiting the 722 + number of iterations in the innermost loop). However, the number of 723 + digits that suffice to represent a given delta can sometimes 724 + represent much larger deltas (because of the adaptation), and hence 725 + this approach would probably need integers wider than 32 bits. 726 + 727 + 728 + 729 + 730 + Costello Standards Track [Page 13] 731 + 732 + RFC 3492 IDNA Punycode March 2003 733 + 734 + 735 + Yet another approach for the decoder is to allow overflow to occur, 736 + but to check the final output string by re-encoding it and comparing 737 + to the decoder input. If and only if they do not match (using a 738 + case-insensitive ASCII comparison) overflow has occurred. This 739 + delayed-detection approach would not impose any more restrictions on 740 + the input than the immediate-detection approach does, and might be 741 + considered simpler in some programming languages. 742 + 743 + In fact, if the decoder is used only inside the IDNA ToUnicode 744 + operation [IDNA], then it need not check for overflow at all, because 745 + ToUnicode performs a higher level re-encoding and comparison, and a 746 + mismatch has the same consequence as if the Punycode decoder had 747 + failed. 748 + 749 + 7. Punycode examples 750 + 751 + 7.1 Sample strings 752 + 753 + In the Punycode encodings below, the ACE prefix is not shown. 754 + Backslashes show where line breaks have been inserted in strings too 755 + long for one line. 756 + 757 + The first several examples are all translations of the sentence "Why 758 + can't they just speak in <language>?" (courtesy of Michael Kaplan's 759 + "provincial" page [PROVINCIAL]). Word breaks and punctuation have 760 + been removed, as is often done in domain names. 761 + 762 + (A) Arabic (Egyptian): 763 + u+0644 u+064A u+0647 u+0645 u+0627 u+0628 u+062A u+0643 u+0644 764 + u+0645 u+0648 u+0634 u+0639 u+0631 u+0628 u+064A u+061F 765 + Punycode: egbpdaj6bu4bxfgehfvwxn 766 + 767 + (B) Chinese (simplified): 768 + u+4ED6 u+4EEC u+4E3A u+4EC0 u+4E48 u+4E0D u+8BF4 u+4E2D u+6587 769 + Punycode: ihqwcrb4cv8a8dqg056pqjye 770 + 771 + (C) Chinese (traditional): 772 + u+4ED6 u+5011 u+7232 u+4EC0 u+9EBD u+4E0D u+8AAA u+4E2D u+6587 773 + Punycode: ihqwctvzc91f659drss3x8bo0yb 774 + 775 + (D) Czech: Pro<ccaron>prost<ecaron>nemluv<iacute><ccaron>esky 776 + U+0050 u+0072 u+006F u+010D u+0070 u+0072 u+006F u+0073 u+0074 777 + u+011B u+006E u+0065 u+006D u+006C u+0075 u+0076 u+00ED u+010D 778 + u+0065 u+0073 u+006B u+0079 779 + Punycode: Proprostnemluvesky-uyb24dma41a 780 + 781 + 782 + 783 + 784 + 785 + 786 + Costello Standards Track [Page 14] 787 + 788 + RFC 3492 IDNA Punycode March 2003 789 + 790 + 791 + (E) Hebrew: 792 + u+05DC u+05DE u+05D4 u+05D4 u+05DD u+05E4 u+05E9 u+05D5 u+05D8 793 + u+05DC u+05D0 u+05DE u+05D3 u+05D1 u+05E8 u+05D9 u+05DD u+05E2 794 + u+05D1 u+05E8 u+05D9 u+05EA 795 + Punycode: 4dbcagdahymbxekheh6e0a7fei0b 796 + 797 + (F) Hindi (Devanagari): 798 + u+092F u+0939 u+0932 u+094B u+0917 u+0939 u+093F u+0928 u+094D 799 + u+0926 u+0940 u+0915 u+094D u+092F u+094B u+0902 u+0928 u+0939 800 + u+0940 u+0902 u+092C u+094B u+0932 u+0938 u+0915 u+0924 u+0947 801 + u+0939 u+0948 u+0902 802 + Punycode: i1baa7eci9glrd9b2ae1bj0hfcgg6iyaf8o0a1dig0cd 803 + 804 + (G) Japanese (kanji and hiragana): 805 + u+306A u+305C u+307F u+3093 u+306A u+65E5 u+672C u+8A9E u+3092 806 + u+8A71 u+3057 u+3066 u+304F u+308C u+306A u+3044 u+306E u+304B 807 + Punycode: n8jok5ay5dzabd5bym9f0cm5685rrjetr6pdxa 808 + 809 + (H) Korean (Hangul syllables): 810 + u+C138 u+ACC4 u+C758 u+BAA8 u+B4E0 u+C0AC u+B78C u+B4E4 u+C774 811 + u+D55C u+AD6D u+C5B4 u+B97C u+C774 u+D574 u+D55C u+B2E4 u+BA74 812 + u+C5BC u+B9C8 u+B098 u+C88B u+C744 u+AE4C 813 + Punycode: 989aomsvi5e83db1d2a355cv1e0vak1dwrv93d5xbh15a0dt30a5j\ 814 + psd879ccm6fea98c 815 + 816 + (I) Russian (Cyrillic): 817 + U+043F u+043E u+0447 u+0435 u+043C u+0443 u+0436 u+0435 u+043E 818 + u+043D u+0438 u+043D u+0435 u+0433 u+043E u+0432 u+043E u+0440 819 + u+044F u+0442 u+043F u+043E u+0440 u+0443 u+0441 u+0441 u+043A 820 + u+0438 821 + Punycode: b1abfaaepdrnnbgefbaDotcwatmq2g4l 822 + 823 + (J) Spanish: Porqu<eacute>nopuedensimplementehablarenEspa<ntilde>ol 824 + U+0050 u+006F u+0072 u+0071 u+0075 u+00E9 u+006E u+006F u+0070 825 + u+0075 u+0065 u+0064 u+0065 u+006E u+0073 u+0069 u+006D u+0070 826 + u+006C u+0065 u+006D u+0065 u+006E u+0074 u+0065 u+0068 u+0061 827 + u+0062 u+006C u+0061 u+0072 u+0065 u+006E U+0045 u+0073 u+0070 828 + u+0061 u+00F1 u+006F u+006C 829 + Punycode: PorqunopuedensimplementehablarenEspaol-fmd56a 830 + 831 + (K) Vietnamese: 832 + T<adotbelow>isaoh<odotbelow>kh<ocirc>ngth<ecirchookabove>ch\ 833 + <ihookabove>n<oacute>iti<ecircacute>ngVi<ecircdotbelow>t 834 + U+0054 u+1EA1 u+0069 u+0073 u+0061 u+006F u+0068 u+1ECD u+006B 835 + u+0068 u+00F4 u+006E u+0067 u+0074 u+0068 u+1EC3 u+0063 u+0068 836 + u+1EC9 u+006E u+00F3 u+0069 u+0074 u+0069 u+1EBF u+006E u+0067 837 + U+0056 u+0069 u+1EC7 u+0074 838 + Punycode: TisaohkhngthchnitingVit-kjcr8268qyxafd2f1b9g 839 + 840 + 841 + 842 + Costello Standards Track [Page 15] 843 + 844 + RFC 3492 IDNA Punycode March 2003 845 + 846 + 847 + The next several examples are all names of Japanese music artists, 848 + song titles, and TV programs, just because the author happens to have 849 + them handy (but Japanese is useful for providing examples of single- 850 + row text, two-row text, ideographic text, and various mixtures 851 + thereof). 852 + 853 + (L) 3<nen>B<gumi><kinpachi><sensei> 854 + u+0033 u+5E74 U+0042 u+7D44 u+91D1 u+516B u+5148 u+751F 855 + Punycode: 3B-ww4c5e180e575a65lsy2b 856 + 857 + (M) <amuro><namie>-with-SUPER-MONKEYS 858 + u+5B89 u+5BA4 u+5948 u+7F8E u+6075 u+002D u+0077 u+0069 u+0074 859 + u+0068 u+002D U+0053 U+0055 U+0050 U+0045 U+0052 u+002D U+004D 860 + U+004F U+004E U+004B U+0045 U+0059 U+0053 861 + Punycode: -with-SUPER-MONKEYS-pc58ag80a8qai00g7n9n 862 + 863 + (N) Hello-Another-Way-<sorezore><no><basho> 864 + U+0048 u+0065 u+006C u+006C u+006F u+002D U+0041 u+006E u+006F 865 + u+0074 u+0068 u+0065 u+0072 u+002D U+0057 u+0061 u+0079 u+002D 866 + u+305D u+308C u+305E u+308C u+306E u+5834 u+6240 867 + Punycode: Hello-Another-Way--fc4qua05auwb3674vfr0b 868 + 869 + (O) <hitotsu><yane><no><shita>2 870 + u+3072 u+3068 u+3064 u+5C4B u+6839 u+306E u+4E0B u+0032 871 + Punycode: 2-u9tlzr9756bt3uc0v 872 + 873 + (P) Maji<de>Koi<suru>5<byou><mae> 874 + U+004D u+0061 u+006A u+0069 u+3067 U+004B u+006F u+0069 u+3059 875 + u+308B u+0035 u+79D2 u+524D 876 + Punycode: MajiKoi5-783gue6qz075azm5e 877 + 878 + (Q) <pafii>de<runba> 879 + u+30D1 u+30D5 u+30A3 u+30FC u+0064 u+0065 u+30EB u+30F3 u+30D0 880 + Punycode: de-jg4avhby1noc0d 881 + 882 + (R) <sono><supiido><de> 883 + u+305D u+306E u+30B9 u+30D4 u+30FC u+30C9 u+3067 884 + Punycode: d9juau41awczczp 885 + 886 + The last example is an ASCII string that breaks the existing rules 887 + for host name labels. (It is not a realistic example for IDNA, 888 + because IDNA never encodes pure ASCII labels.) 889 + 890 + (S) -> $1.00 <- 891 + u+002D u+003E u+0020 u+0024 u+0031 u+002E u+0030 u+0030 u+0020 892 + u+003C u+002D 893 + Punycode: -> $1.00 <-- 894 + 895 + 896 + 897 + 898 + Costello Standards Track [Page 16] 899 + 900 + RFC 3492 IDNA Punycode March 2003 901 + 902 + 903 + 7.2 Decoding traces 904 + 905 + In the following traces, the evolving state of the decoder is shown 906 + as a sequence of hexadecimal values, representing the code points in 907 + the extended string. An asterisk appears just after the most 908 + recently inserted code point, indicating both n (the value preceeding 909 + the asterisk) and i (the position of the value just after the 910 + asterisk). Other numerical values are decimal. 911 + 912 + Decoding trace of example B from section 7.1: 913 + 914 + n is 128, i is 0, bias is 72 915 + input is "ihqwcrb4cv8a8dqg056pqjye" 916 + there is no delimiter, so extended string starts empty 917 + delta "ihq" decodes to 19853 918 + bias becomes 21 919 + 4E0D * 920 + delta "wc" decodes to 64 921 + bias becomes 20 922 + 4E0D 4E2D * 923 + delta "rb" decodes to 37 924 + bias becomes 13 925 + 4E3A * 4E0D 4E2D 926 + delta "4c" decodes to 56 927 + bias becomes 17 928 + 4E3A 4E48 * 4E0D 4E2D 929 + delta "v8a" decodes to 599 930 + bias becomes 32 931 + 4E3A 4EC0 * 4E48 4E0D 4E2D 932 + delta "8d" decodes to 130 933 + bias becomes 23 934 + 4ED6 * 4E3A 4EC0 4E48 4E0D 4E2D 935 + delta "qg" decodes to 154 936 + bias becomes 25 937 + 4ED6 4EEC * 4E3A 4EC0 4E48 4E0D 4E2D 938 + delta "056p" decodes to 46301 939 + bias becomes 84 940 + 4ED6 4EEC 4E3A 4EC0 4E48 4E0D 4E2D 6587 * 941 + delta "qjye" decodes to 88531 942 + bias becomes 90 943 + 4ED6 4EEC 4E3A 4EC0 4E48 4E0D 8BF4 * 4E2D 6587 944 + 945 + 946 + 947 + 948 + 949 + 950 + 951 + 952 + 953 + 954 + Costello Standards Track [Page 17] 955 + 956 + RFC 3492 IDNA Punycode March 2003 957 + 958 + 959 + Decoding trace of example L from section 7.1: 960 + 961 + n is 128, i is 0, bias is 72 962 + input is "3B-ww4c5e180e575a65lsy2b" 963 + literal portion is "3B-", so extended string starts as: 964 + 0033 0042 965 + delta "ww4c" decodes to 62042 966 + bias becomes 27 967 + 0033 0042 5148 * 968 + delta "5e" decodes to 139 969 + bias becomes 24 970 + 0033 0042 516B * 5148 971 + delta "180e" decodes to 16683 972 + bias becomes 67 973 + 0033 5E74 * 0042 516B 5148 974 + delta "575a" decodes to 34821 975 + bias becomes 82 976 + 0033 5E74 0042 516B 5148 751F * 977 + delta "65l" decodes to 14592 978 + bias becomes 67 979 + 0033 5E74 0042 7D44 * 516B 5148 751F 980 + delta "sy2b" decodes to 42088 981 + bias becomes 84 982 + 0033 5E74 0042 7D44 91D1 * 516B 5148 751F 983 + 984 + 985 + 986 + 987 + 988 + 989 + 990 + 991 + 992 + 993 + 994 + 995 + 996 + 997 + 998 + 999 + 1000 + 1001 + 1002 + 1003 + 1004 + 1005 + 1006 + 1007 + 1008 + 1009 + 1010 + Costello Standards Track [Page 18] 1011 + 1012 + RFC 3492 IDNA Punycode March 2003 1013 + 1014 + 1015 + 7.3 Encoding traces 1016 + 1017 + In the following traces, code point values are hexadecimal, while 1018 + other numerical values are decimal. 1019 + 1020 + Encoding trace of example B from section 7.1: 1021 + 1022 + bias is 72 1023 + input is: 1024 + 4ED6 4EEC 4E3A 4EC0 4E48 4E0D 8BF4 4E2D 6587 1025 + there are no basic code points, so no literal portion 1026 + next code point to insert is 4E0D 1027 + needed delta is 19853, encodes as "ihq" 1028 + bias becomes 21 1029 + next code point to insert is 4E2D 1030 + needed delta is 64, encodes as "wc" 1031 + bias becomes 20 1032 + next code point to insert is 4E3A 1033 + needed delta is 37, encodes as "rb" 1034 + bias becomes 13 1035 + next code point to insert is 4E48 1036 + needed delta is 56, encodes as "4c" 1037 + bias becomes 17 1038 + next code point to insert is 4EC0 1039 + needed delta is 599, encodes as "v8a" 1040 + bias becomes 32 1041 + next code point to insert is 4ED6 1042 + needed delta is 130, encodes as "8d" 1043 + bias becomes 23 1044 + next code point to insert is 4EEC 1045 + needed delta is 154, encodes as "qg" 1046 + bias becomes 25 1047 + next code point to insert is 6587 1048 + needed delta is 46301, encodes as "056p" 1049 + bias becomes 84 1050 + next code point to insert is 8BF4 1051 + needed delta is 88531, encodes as "qjye" 1052 + bias becomes 90 1053 + output is "ihqwcrb4cv8a8dqg056pqjye" 1054 + 1055 + 1056 + 1057 + 1058 + 1059 + 1060 + 1061 + 1062 + 1063 + 1064 + 1065 + 1066 + Costello Standards Track [Page 19] 1067 + 1068 + RFC 3492 IDNA Punycode March 2003 1069 + 1070 + 1071 + Encoding trace of example L from section 7.1: 1072 + 1073 + bias is 72 1074 + input is: 1075 + 0033 5E74 0042 7D44 91D1 516B 5148 751F 1076 + basic code points (0033, 0042) are copied to literal portion: "3B-" 1077 + next code point to insert is 5148 1078 + needed delta is 62042, encodes as "ww4c" 1079 + bias becomes 27 1080 + next code point to insert is 516B 1081 + needed delta is 139, encodes as "5e" 1082 + bias becomes 24 1083 + next code point to insert is 5E74 1084 + needed delta is 16683, encodes as "180e" 1085 + bias becomes 67 1086 + next code point to insert is 751F 1087 + needed delta is 34821, encodes as "575a" 1088 + bias becomes 82 1089 + next code point to insert is 7D44 1090 + needed delta is 14592, encodes as "65l" 1091 + bias becomes 67 1092 + next code point to insert is 91D1 1093 + needed delta is 42088, encodes as "sy2b" 1094 + bias becomes 84 1095 + output is "3B-ww4c5e180e575a65lsy2b" 1096 + 1097 + 8. Security Considerations 1098 + 1099 + Users expect each domain name in DNS to be controlled by a single 1100 + authority. If a Unicode string intended for use as a domain label 1101 + could map to multiple ACE labels, then an internationalized domain 1102 + name could map to multiple ASCII domain names, each controlled by a 1103 + different authority, some of which could be spoofs that hijack 1104 + service requests intended for another. Therefore Punycode is 1105 + designed so that each Unicode string has a unique encoding. 1106 + 1107 + However, there can still be multiple Unicode representations of the 1108 + "same" text, for various definitions of "same". This problem is 1109 + addressed to some extent by the Unicode standard under the topic of 1110 + canonicalization, and this work is leveraged for domain names by 1111 + Nameprep [NAMEPREP]. 1112 + 1113 + 1114 + 1115 + 1116 + 1117 + 1118 + 1119 + 1120 + 1121 + 1122 + Costello Standards Track [Page 20] 1123 + 1124 + RFC 3492 IDNA Punycode March 2003 1125 + 1126 + 1127 + 9. References 1128 + 1129 + 9.1 Normative References 1130 + 1131 + [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1132 + Requirement Levels", BCP 14, RFC 2119, March 1997. 1133 + 1134 + 9.2 Informative References 1135 + 1136 + [RFC952] Harrenstien, K., Stahl, M. and E. Feinler, "DOD Internet 1137 + Host Table Specification", RFC 952, October 1985. 1138 + 1139 + [RFC1034] Mockapetris, P., "Domain Names - Concepts and 1140 + Facilities", STD 13, RFC 1034, November 1987. 1141 + 1142 + [IDNA] Faltstrom, P., Hoffman, P. and A. Costello, 1143 + "Internationalizing Domain Names in Applications 1144 + (IDNA)", RFC 3490, March 2003. 1145 + 1146 + [NAMEPREP] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep 1147 + Profile for Internationalized Domain Names (IDN)", RFC 1148 + 3491, March 2003. 1149 + 1150 + [ASCII] Cerf, V., "ASCII format for Network Interchange", RFC 1151 + 20, October 1969. 1152 + 1153 + [PROVINCIAL] Kaplan, M., "The 'anyone can be provincial!' page", 1154 + http://www.trigeminal.com/samples/provincial.html. 1155 + 1156 + [UNICODE] The Unicode Consortium, "The Unicode Standard", 1157 + http://www.unicode.org/unicode/standard/standard.html. 1158 + 1159 + 1160 + 1161 + 1162 + 1163 + 1164 + 1165 + 1166 + 1167 + 1168 + 1169 + 1170 + 1171 + 1172 + 1173 + 1174 + 1175 + 1176 + 1177 + 1178 + Costello Standards Track [Page 21] 1179 + 1180 + RFC 3492 IDNA Punycode March 2003 1181 + 1182 + 1183 + A. Mixed-case annotation 1184 + 1185 + In order to use Punycode to represent case-insensitive strings, 1186 + higher layers need to case-fold the strings prior to Punycode 1187 + encoding. The encoded string can use mixed case as an annotation 1188 + telling how to convert the folded string into a mixed-case string for 1189 + display purposes. Note, however, that mixed-case annotation is not 1190 + used by the ToASCII and ToUnicode operations specified in [IDNA], and 1191 + therefore implementors of IDNA can disregard this appendix. 1192 + 1193 + Basic code points can use mixed case directly, because the decoder 1194 + copies them verbatim, leaving lowercase code points lowercase, and 1195 + leaving uppercase code points uppercase. Each non-basic code point 1196 + is represented by a delta, which is represented by a sequence of 1197 + basic code points, the last of which provides the annotation. If it 1198 + is uppercase, it is a suggestion to map the non-basic code point to 1199 + uppercase (if possible); if it is lowercase, it is a suggestion to 1200 + map the non-basic code point to lowercase (if possible). 1201 + 1202 + These annotations do not alter the code points returned by decoders; 1203 + the annotations are returned separately, for the caller to use or 1204 + ignore. Encoders can accept annotations in addition to code points, 1205 + but the annotations do not alter the output, except to influence the 1206 + uppercase/lowercase form of ASCII letters. 1207 + 1208 + Punycode encoders and decoders need not support these annotations, 1209 + and higher layers need not use them. 1210 + 1211 + B. Disclaimer and license 1212 + 1213 + Regarding this entire document or any portion of it (including the 1214 + pseudocode and C code), the author makes no guarantees and is not 1215 + responsible for any damage resulting from its use. The author grants 1216 + irrevocable permission to anyone to use, modify, and distribute it in 1217 + any way that does not diminish the rights of anyone else to use, 1218 + modify, and distribute it, provided that redistributed derivative 1219 + works do not contain misleading author or version information. 1220 + Derivative works need not be licensed under similar terms. 1221 + 1222 + 1223 + 1224 + 1225 + 1226 + 1227 + 1228 + 1229 + 1230 + 1231 + 1232 + 1233 + 1234 + Costello Standards Track [Page 22] 1235 + 1236 + RFC 3492 IDNA Punycode March 2003 1237 + 1238 + 1239 + C. Punycode sample implementation 1240 + 1241 + /* 1242 + punycode.c from RFC 3492 1243 + http://www.nicemice.net/idn/ 1244 + Adam M. Costello 1245 + http://www.nicemice.net/amc/ 1246 + 1247 + This is ANSI C code (C89) implementing Punycode (RFC 3492). 1248 + 1249 + */ 1250 + 1251 + 1252 + /************************************************************/ 1253 + /* Public interface (would normally go in its own .h file): */ 1254 + 1255 + #include <limits.h> 1256 + 1257 + enum punycode_status { 1258 + punycode_success, 1259 + punycode_bad_input, /* Input is invalid. */ 1260 + punycode_big_output, /* Output would exceed the space provided. */ 1261 + punycode_overflow /* Input needs wider integers to process. */ 1262 + }; 1263 + 1264 + #if UINT_MAX >= (1 << 26) - 1 1265 + typedef unsigned int punycode_uint; 1266 + #else 1267 + typedef unsigned long punycode_uint; 1268 + #endif 1269 + 1270 + enum punycode_status punycode_encode( 1271 + punycode_uint input_length, 1272 + const punycode_uint input[], 1273 + const unsigned char case_flags[], 1274 + punycode_uint *output_length, 1275 + char output[] ); 1276 + 1277 + /* punycode_encode() converts Unicode to Punycode. The input */ 1278 + /* is represented as an array of Unicode code points (not code */ 1279 + /* units; surrogate pairs are not allowed), and the output */ 1280 + /* will be represented as an array of ASCII code points. The */ 1281 + /* output string is *not* null-terminated; it will contain */ 1282 + /* zeros if and only if the input contains zeros. (Of course */ 1283 + /* the caller can leave room for a terminator and add one if */ 1284 + /* needed.) The input_length is the number of code points in */ 1285 + /* the input. The output_length is an in/out argument: the */ 1286 + /* caller passes in the maximum number of code points that it */ 1287 + 1288 + 1289 + 1290 + Costello Standards Track [Page 23] 1291 + 1292 + RFC 3492 IDNA Punycode March 2003 1293 + 1294 + 1295 + /* can receive, and on successful return it will contain the */ 1296 + /* number of code points actually output. The case_flags array */ 1297 + /* holds input_length boolean values, where nonzero suggests that */ 1298 + /* the corresponding Unicode character be forced to uppercase */ 1299 + /* after being decoded (if possible), and zero suggests that */ 1300 + /* it be forced to lowercase (if possible). ASCII code points */ 1301 + /* are encoded literally, except that ASCII letters are forced */ 1302 + /* to uppercase or lowercase according to the corresponding */ 1303 + /* uppercase flags. If case_flags is a null pointer then ASCII */ 1304 + /* letters are left as they are, and other code points are */ 1305 + /* treated as if their uppercase flags were zero. The return */ 1306 + /* value can be any of the punycode_status values defined above */ 1307 + /* except punycode_bad_input; if not punycode_success, then */ 1308 + /* output_size and output might contain garbage. */ 1309 + 1310 + enum punycode_status punycode_decode( 1311 + punycode_uint input_length, 1312 + const char input[], 1313 + punycode_uint *output_length, 1314 + punycode_uint output[], 1315 + unsigned char case_flags[] ); 1316 + 1317 + /* punycode_decode() converts Punycode to Unicode. The input is */ 1318 + /* represented as an array of ASCII code points, and the output */ 1319 + /* will be represented as an array of Unicode code points. The */ 1320 + /* input_length is the number of code points in the input. The */ 1321 + /* output_length is an in/out argument: the caller passes in */ 1322 + /* the maximum number of code points that it can receive, and */ 1323 + /* on successful return it will contain the actual number of */ 1324 + /* code points output. The case_flags array needs room for at */ 1325 + /* least output_length values, or it can be a null pointer if the */ 1326 + /* case information is not needed. A nonzero flag suggests that */ 1327 + /* the corresponding Unicode character be forced to uppercase */ 1328 + /* by the caller (if possible), while zero suggests that it be */ 1329 + /* forced to lowercase (if possible). ASCII code points are */ 1330 + /* output already in the proper case, but their flags will be set */ 1331 + /* appropriately so that applying the flags would be harmless. */ 1332 + /* The return value can be any of the punycode_status values */ 1333 + /* defined above; if not punycode_success, then output_length, */ 1334 + /* output, and case_flags might contain garbage. On success, the */ 1335 + /* decoder will never need to write an output_length greater than */ 1336 + /* input_length, because of how the encoding is defined. */ 1337 + 1338 + /**********************************************************/ 1339 + /* Implementation (would normally go in its own .c file): */ 1340 + 1341 + #include <string.h> 1342 + 1343 + 1344 + 1345 + 1346 + Costello Standards Track [Page 24] 1347 + 1348 + RFC 3492 IDNA Punycode March 2003 1349 + 1350 + 1351 + /*** Bootstring parameters for Punycode ***/ 1352 + 1353 + enum { base = 36, tmin = 1, tmax = 26, skew = 38, damp = 700, 1354 + initial_bias = 72, initial_n = 0x80, delimiter = 0x2D }; 1355 + 1356 + /* basic(cp) tests whether cp is a basic code point: */ 1357 + #define basic(cp) ((punycode_uint)(cp) < 0x80) 1358 + 1359 + /* delim(cp) tests whether cp is a delimiter: */ 1360 + #define delim(cp) ((cp) == delimiter) 1361 + 1362 + /* decode_digit(cp) returns the numeric value of a basic code */ 1363 + /* point (for use in representing integers) in the range 0 to */ 1364 + /* base-1, or base if cp is does not represent a value. */ 1365 + 1366 + static punycode_uint decode_digit(punycode_uint cp) 1367 + { 1368 + return cp - 48 < 10 ? cp - 22 : cp - 65 < 26 ? cp - 65 : 1369 + cp - 97 < 26 ? cp - 97 : base; 1370 + } 1371 + 1372 + /* encode_digit(d,flag) returns the basic code point whose value */ 1373 + /* (when used for representing integers) is d, which needs to be in */ 1374 + /* the range 0 to base-1. The lowercase form is used unless flag is */ 1375 + /* nonzero, in which case the uppercase form is used. The behavior */ 1376 + /* is undefined if flag is nonzero and digit d has no uppercase form. */ 1377 + 1378 + static char encode_digit(punycode_uint d, int flag) 1379 + { 1380 + return d + 22 + 75 * (d < 26) - ((flag != 0) << 5); 1381 + /* 0..25 map to ASCII a..z or A..Z */ 1382 + /* 26..35 map to ASCII 0..9 */ 1383 + } 1384 + 1385 + /* flagged(bcp) tests whether a basic code point is flagged */ 1386 + /* (uppercase). The behavior is undefined if bcp is not a */ 1387 + /* basic code point. */ 1388 + 1389 + #define flagged(bcp) ((punycode_uint)(bcp) - 65 < 26) 1390 + 1391 + /* encode_basic(bcp,flag) forces a basic code point to lowercase */ 1392 + /* if flag is zero, uppercase if flag is nonzero, and returns */ 1393 + /* the resulting code point. The code point is unchanged if it */ 1394 + /* is caseless. The behavior is undefined if bcp is not a basic */ 1395 + /* code point. */ 1396 + 1397 + static char encode_basic(punycode_uint bcp, int flag) 1398 + { 1399 + 1400 + 1401 + 1402 + Costello Standards Track [Page 25] 1403 + 1404 + RFC 3492 IDNA Punycode March 2003 1405 + 1406 + 1407 + bcp -= (bcp - 97 < 26) << 5; 1408 + return bcp + ((!flag && (bcp - 65 < 26)) << 5); 1409 + } 1410 + 1411 + /*** Platform-specific constants ***/ 1412 + 1413 + /* maxint is the maximum value of a punycode_uint variable: */ 1414 + static const punycode_uint maxint = -1; 1415 + /* Because maxint is unsigned, -1 becomes the maximum value. */ 1416 + 1417 + /*** Bias adaptation function ***/ 1418 + 1419 + static punycode_uint adapt( 1420 + punycode_uint delta, punycode_uint numpoints, int firsttime ) 1421 + { 1422 + punycode_uint k; 1423 + 1424 + delta = firsttime ? delta / damp : delta >> 1; 1425 + /* delta >> 1 is a faster way of doing delta / 2 */ 1426 + delta += delta / numpoints; 1427 + 1428 + for (k = 0; delta > ((base - tmin) * tmax) / 2; k += base) { 1429 + delta /= base - tmin; 1430 + } 1431 + 1432 + return k + (base - tmin + 1) * delta / (delta + skew); 1433 + } 1434 + 1435 + /*** Main encode function ***/ 1436 + 1437 + enum punycode_status punycode_encode( 1438 + punycode_uint input_length, 1439 + const punycode_uint input[], 1440 + const unsigned char case_flags[], 1441 + punycode_uint *output_length, 1442 + char output[] ) 1443 + { 1444 + punycode_uint n, delta, h, b, out, max_out, bias, j, m, q, k, t; 1445 + 1446 + /* Initialize the state: */ 1447 + 1448 + n = initial_n; 1449 + delta = out = 0; 1450 + max_out = *output_length; 1451 + bias = initial_bias; 1452 + 1453 + /* Handle the basic code points: */ 1454 + 1455 + 1456 + 1457 + 1458 + Costello Standards Track [Page 26] 1459 + 1460 + RFC 3492 IDNA Punycode March 2003 1461 + 1462 + 1463 + for (j = 0; j < input_length; ++j) { 1464 + if (basic(input[j])) { 1465 + if (max_out - out < 2) return punycode_big_output; 1466 + output[out++] = 1467 + case_flags ? encode_basic(input[j], case_flags[j]) : input[j]; 1468 + } 1469 + /* else if (input[j] < n) return punycode_bad_input; */ 1470 + /* (not needed for Punycode with unsigned code points) */ 1471 + } 1472 + 1473 + h = b = out; 1474 + 1475 + /* h is the number of code points that have been handled, b is the */ 1476 + /* number of basic code points, and out is the number of characters */ 1477 + /* that have been output. */ 1478 + 1479 + if (b > 0) output[out++] = delimiter; 1480 + 1481 + /* Main encoding loop: */ 1482 + 1483 + while (h < input_length) { 1484 + /* All non-basic code points < n have been */ 1485 + /* handled already. Find the next larger one: */ 1486 + 1487 + for (m = maxint, j = 0; j < input_length; ++j) { 1488 + /* if (basic(input[j])) continue; */ 1489 + /* (not needed for Punycode) */ 1490 + if (input[j] >= n && input[j] < m) m = input[j]; 1491 + } 1492 + 1493 + /* Increase delta enough to advance the decoder's */ 1494 + /* <n,i> state to <m,0>, but guard against overflow: */ 1495 + 1496 + if (m - n > (maxint - delta) / (h + 1)) return punycode_overflow; 1497 + delta += (m - n) * (h + 1); 1498 + n = m; 1499 + 1500 + for (j = 0; j < input_length; ++j) { 1501 + /* Punycode does not need to check whether input[j] is basic: */ 1502 + if (input[j] < n /* || basic(input[j]) */ ) { 1503 + if (++delta == 0) return punycode_overflow; 1504 + } 1505 + 1506 + if (input[j] == n) { 1507 + /* Represent delta as a generalized variable-length integer: */ 1508 + 1509 + for (q = delta, k = base; ; k += base) { 1510 + if (out >= max_out) return punycode_big_output; 1511 + 1512 + 1513 + 1514 + Costello Standards Track [Page 27] 1515 + 1516 + RFC 3492 IDNA Punycode March 2003 1517 + 1518 + 1519 + t = k <= bias /* + tmin */ ? tmin : /* +tmin not needed */ 1520 + k >= bias + tmax ? tmax : k - bias; 1521 + if (q < t) break; 1522 + output[out++] = encode_digit(t + (q - t) % (base - t), 0); 1523 + q = (q - t) / (base - t); 1524 + } 1525 + 1526 + output[out++] = encode_digit(q, case_flags && case_flags[j]); 1527 + bias = adapt(delta, h + 1, h == b); 1528 + delta = 0; 1529 + ++h; 1530 + } 1531 + } 1532 + 1533 + ++delta, ++n; 1534 + } 1535 + 1536 + *output_length = out; 1537 + return punycode_success; 1538 + } 1539 + 1540 + /*** Main decode function ***/ 1541 + 1542 + enum punycode_status punycode_decode( 1543 + punycode_uint input_length, 1544 + const char input[], 1545 + punycode_uint *output_length, 1546 + punycode_uint output[], 1547 + unsigned char case_flags[] ) 1548 + { 1549 + punycode_uint n, out, i, max_out, bias, 1550 + b, j, in, oldi, w, k, digit, t; 1551 + 1552 + /* Initialize the state: */ 1553 + 1554 + n = initial_n; 1555 + out = i = 0; 1556 + max_out = *output_length; 1557 + bias = initial_bias; 1558 + 1559 + /* Handle the basic code points: Let b be the number of input code */ 1560 + /* points before the last delimiter, or 0 if there is none, then */ 1561 + /* copy the first b code points to the output. */ 1562 + 1563 + for (b = j = 0; j < input_length; ++j) if (delim(input[j])) b = j; 1564 + if (b > max_out) return punycode_big_output; 1565 + 1566 + for (j = 0; j < b; ++j) { 1567 + 1568 + 1569 + 1570 + Costello Standards Track [Page 28] 1571 + 1572 + RFC 3492 IDNA Punycode March 2003 1573 + 1574 + 1575 + if (case_flags) case_flags[out] = flagged(input[j]); 1576 + if (!basic(input[j])) return punycode_bad_input; 1577 + output[out++] = input[j]; 1578 + } 1579 + 1580 + /* Main decoding loop: Start just after the last delimiter if any */ 1581 + /* basic code points were copied; start at the beginning otherwise. */ 1582 + 1583 + for (in = b > 0 ? b + 1 : 0; in < input_length; ++out) { 1584 + 1585 + /* in is the index of the next character to be consumed, and */ 1586 + /* out is the number of code points in the output array. */ 1587 + 1588 + /* Decode a generalized variable-length integer into delta, */ 1589 + /* which gets added to i. The overflow checking is easier */ 1590 + /* if we increase i as we go, then subtract off its starting */ 1591 + /* value at the end to obtain delta. */ 1592 + 1593 + for (oldi = i, w = 1, k = base; ; k += base) { 1594 + if (in >= input_length) return punycode_bad_input; 1595 + digit = decode_digit(input[in++]); 1596 + if (digit >= base) return punycode_bad_input; 1597 + if (digit > (maxint - i) / w) return punycode_overflow; 1598 + i += digit * w; 1599 + t = k <= bias /* + tmin */ ? tmin : /* +tmin not needed */ 1600 + k >= bias + tmax ? tmax : k - bias; 1601 + if (digit < t) break; 1602 + if (w > maxint / (base - t)) return punycode_overflow; 1603 + w *= (base - t); 1604 + } 1605 + 1606 + bias = adapt(i - oldi, out + 1, oldi == 0); 1607 + 1608 + /* i was supposed to wrap around from out+1 to 0, */ 1609 + /* incrementing n each time, so we'll fix that now: */ 1610 + 1611 + if (i / (out + 1) > maxint - n) return punycode_overflow; 1612 + n += i / (out + 1); 1613 + i %= (out + 1); 1614 + 1615 + /* Insert n at position i of the output: */ 1616 + 1617 + /* not needed for Punycode: */ 1618 + /* if (decode_digit(n) <= base) return punycode_invalid_input; */ 1619 + if (out >= max_out) return punycode_big_output; 1620 + 1621 + if (case_flags) { 1622 + memmove(case_flags + i + 1, case_flags + i, out - i); 1623 + 1624 + 1625 + 1626 + Costello Standards Track [Page 29] 1627 + 1628 + RFC 3492 IDNA Punycode March 2003 1629 + 1630 + 1631 + /* Case of last character determines uppercase flag: */ 1632 + case_flags[i] = flagged(input[in - 1]); 1633 + } 1634 + 1635 + memmove(output + i + 1, output + i, (out - i) * sizeof *output); 1636 + output[i++] = n; 1637 + } 1638 + 1639 + *output_length = out; 1640 + return punycode_success; 1641 + } 1642 + 1643 + /******************************************************************/ 1644 + /* Wrapper for testing (would normally go in a separate .c file): */ 1645 + 1646 + #include <assert.h> 1647 + #include <stdio.h> 1648 + #include <stdlib.h> 1649 + #include <string.h> 1650 + 1651 + /* For testing, we'll just set some compile-time limits rather than */ 1652 + /* use malloc(), and set a compile-time option rather than using a */ 1653 + /* command-line option. */ 1654 + 1655 + enum { 1656 + unicode_max_length = 256, 1657 + ace_max_length = 256 1658 + }; 1659 + 1660 + static void usage(char **argv) 1661 + { 1662 + fprintf(stderr, 1663 + "\n" 1664 + "%s -e reads code points and writes a Punycode string.\n" 1665 + "%s -d reads a Punycode string and writes code points.\n" 1666 + "\n" 1667 + "Input and output are plain text in the native character set.\n" 1668 + "Code points are in the form u+hex separated by whitespace.\n" 1669 + "Although the specification allows Punycode strings to contain\n" 1670 + "any characters from the ASCII repertoire, this test code\n" 1671 + "supports only the printable characters, and needs the Punycode\n" 1672 + "string to be followed by a newline.\n" 1673 + "The case of the u in u+hex is the force-to-uppercase flag.\n" 1674 + , argv[0], argv[0]); 1675 + exit(EXIT_FAILURE); 1676 + } 1677 + 1678 + static void fail(const char *msg) 1679 + 1680 + 1681 + 1682 + Costello Standards Track [Page 30] 1683 + 1684 + RFC 3492 IDNA Punycode March 2003 1685 + 1686 + 1687 + { 1688 + fputs(msg,stderr); 1689 + exit(EXIT_FAILURE); 1690 + } 1691 + 1692 + static const char too_big[] = 1693 + "input or output is too large, recompile with larger limits\n"; 1694 + static const char invalid_input[] = "invalid input\n"; 1695 + static const char overflow[] = "arithmetic overflow\n"; 1696 + static const char io_error[] = "I/O error\n"; 1697 + 1698 + /* The following string is used to convert printable */ 1699 + /* characters between ASCII and the native charset: */ 1700 + 1701 + static const char print_ascii[] = 1702 + "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n" 1703 + "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n" 1704 + " !\"#$%&'()*+,-./" 1705 + "0123456789:;<=>?" 1706 + "@ABCDEFGHIJKLMNO" 1707 + "PQRSTUVWXYZ[\\]^_" 1708 + "`abcdefghijklmno" 1709 + "pqrstuvwxyz{|}~\n"; 1710 + 1711 + int main(int argc, char **argv) 1712 + { 1713 + enum punycode_status status; 1714 + int r; 1715 + unsigned int input_length, output_length, j; 1716 + unsigned char case_flags[unicode_max_length]; 1717 + 1718 + if (argc != 2) usage(argv); 1719 + if (argv[1][0] != '-') usage(argv); 1720 + if (argv[1][2] != 0) usage(argv); 1721 + 1722 + if (argv[1][1] == 'e') { 1723 + punycode_uint input[unicode_max_length]; 1724 + unsigned long codept; 1725 + char output[ace_max_length+1], uplus[3]; 1726 + int c; 1727 + 1728 + /* Read the input code points: */ 1729 + 1730 + input_length = 0; 1731 + 1732 + for (;;) { 1733 + r = scanf("%2s%lx", uplus, &codept); 1734 + if (ferror(stdin)) fail(io_error); 1735 + 1736 + 1737 + 1738 + Costello Standards Track [Page 31] 1739 + 1740 + RFC 3492 IDNA Punycode March 2003 1741 + 1742 + 1743 + if (r == EOF || r == 0) break; 1744 + 1745 + if (r != 2 || uplus[1] != '+' || codept > (punycode_uint)-1) { 1746 + fail(invalid_input); 1747 + } 1748 + 1749 + if (input_length == unicode_max_length) fail(too_big); 1750 + 1751 + if (uplus[0] == 'u') case_flags[input_length] = 0; 1752 + else if (uplus[0] == 'U') case_flags[input_length] = 1; 1753 + else fail(invalid_input); 1754 + 1755 + input[input_length++] = codept; 1756 + } 1757 + 1758 + /* Encode: */ 1759 + 1760 + output_length = ace_max_length; 1761 + status = punycode_encode(input_length, input, case_flags, 1762 + &output_length, output); 1763 + if (status == punycode_bad_input) fail(invalid_input); 1764 + if (status == punycode_big_output) fail(too_big); 1765 + if (status == punycode_overflow) fail(overflow); 1766 + assert(status == punycode_success); 1767 + 1768 + /* Convert to native charset and output: */ 1769 + 1770 + for (j = 0; j < output_length; ++j) { 1771 + c = output[j]; 1772 + assert(c >= 0 && c <= 127); 1773 + if (print_ascii[c] == 0) fail(invalid_input); 1774 + output[j] = print_ascii[c]; 1775 + } 1776 + 1777 + output[j] = 0; 1778 + r = puts(output); 1779 + if (r == EOF) fail(io_error); 1780 + return EXIT_SUCCESS; 1781 + } 1782 + 1783 + if (argv[1][1] == 'd') { 1784 + char input[ace_max_length+2], *p, *pp; 1785 + punycode_uint output[unicode_max_length]; 1786 + 1787 + /* Read the Punycode input string and convert to ASCII: */ 1788 + 1789 + fgets(input, ace_max_length+2, stdin); 1790 + if (ferror(stdin)) fail(io_error); 1791 + 1792 + 1793 + 1794 + Costello Standards Track [Page 32] 1795 + 1796 + RFC 3492 IDNA Punycode March 2003 1797 + 1798 + 1799 + if (feof(stdin)) fail(invalid_input); 1800 + input_length = strlen(input) - 1; 1801 + if (input[input_length] != '\n') fail(too_big); 1802 + input[input_length] = 0; 1803 + 1804 + for (p = input; *p != 0; ++p) { 1805 + pp = strchr(print_ascii, *p); 1806 + if (pp == 0) fail(invalid_input); 1807 + *p = pp - print_ascii; 1808 + } 1809 + 1810 + /* Decode: */ 1811 + 1812 + output_length = unicode_max_length; 1813 + status = punycode_decode(input_length, input, &output_length, 1814 + output, case_flags); 1815 + if (status == punycode_bad_input) fail(invalid_input); 1816 + if (status == punycode_big_output) fail(too_big); 1817 + if (status == punycode_overflow) fail(overflow); 1818 + assert(status == punycode_success); 1819 + 1820 + /* Output the result: */ 1821 + 1822 + for (j = 0; j < output_length; ++j) { 1823 + r = printf("%s+%04lX\n", 1824 + case_flags[j] ? "U" : "u", 1825 + (unsigned long) output[j] ); 1826 + if (r < 0) fail(io_error); 1827 + } 1828 + 1829 + return EXIT_SUCCESS; 1830 + } 1831 + 1832 + usage(argv); 1833 + return EXIT_SUCCESS; /* not reached, but quiets compiler warning */ 1834 + } 1835 + 1836 + 1837 + 1838 + 1839 + 1840 + 1841 + 1842 + 1843 + 1844 + 1845 + 1846 + 1847 + 1848 + 1849 + 1850 + Costello Standards Track [Page 33] 1851 + 1852 + RFC 3492 IDNA Punycode March 2003 1853 + 1854 + 1855 + Author's Address 1856 + 1857 + Adam M. Costello 1858 + University of California, Berkeley 1859 + http://www.nicemice.net/amc/ 1860 + 1861 + 1862 + 1863 + 1864 + 1865 + 1866 + 1867 + 1868 + 1869 + 1870 + 1871 + 1872 + 1873 + 1874 + 1875 + 1876 + 1877 + 1878 + 1879 + 1880 + 1881 + 1882 + 1883 + 1884 + 1885 + 1886 + 1887 + 1888 + 1889 + 1890 + 1891 + 1892 + 1893 + 1894 + 1895 + 1896 + 1897 + 1898 + 1899 + 1900 + 1901 + 1902 + 1903 + 1904 + 1905 + 1906 + Costello Standards Track [Page 34] 1907 + 1908 + RFC 3492 IDNA Punycode March 2003 1909 + 1910 + 1911 + Full Copyright Statement 1912 + 1913 + Copyright (C) The Internet Society (2003). All Rights Reserved. 1914 + 1915 + This document and translations of it may be copied and furnished to 1916 + others, and derivative works that comment on or otherwise explain it 1917 + or assist in its implementation may be prepared, copied, published 1918 + and distributed, in whole or in part, without restriction of any 1919 + kind, provided that the above copyright notice and this paragraph are 1920 + included on all such copies and derivative works. However, this 1921 + document itself may not be modified in any way, such as by removing 1922 + the copyright notice or references to the Internet Society or other 1923 + Internet organizations, except as needed for the purpose of 1924 + developing Internet standards in which case the procedures for 1925 + copyrights defined in the Internet Standards process must be 1926 + followed, or as required to translate it into languages other than 1927 + English. 1928 + 1929 + The limited permissions granted above are perpetual and will not be 1930 + revoked by the Internet Society or its successors or assigns. 1931 + 1932 + This document and the information contained herein is provided on an 1933 + "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 1934 + TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 1935 + BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 1936 + HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 1937 + MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 1938 + 1939 + Acknowledgement 1940 + 1941 + Funding for the RFC Editor function is currently provided by the 1942 + Internet Society. 1943 + 1944 + 1945 + 1946 + 1947 + 1948 + 1949 + 1950 + 1951 + 1952 + 1953 + 1954 + 1955 + 1956 + 1957 + 1958 + 1959 + 1960 + 1961 + 1962 + Costello Standards Track [Page 35] 1963 +
+1291
spec/rfc5890.txt
··· 1 + 2 + 3 + 4 + 5 + 6 + 7 + Internet Engineering Task Force (IETF) J. Klensin 8 + Request for Comments: 5890 August 2010 9 + Obsoletes: 3490 10 + Category: Standards Track 11 + ISSN: 2070-1721 12 + 13 + 14 + Internationalized Domain Names for Applications (IDNA): 15 + Definitions and Document Framework 16 + 17 + Abstract 18 + 19 + This document is one of a collection that, together, describe the 20 + protocol and usage context for a revision of Internationalized Domain 21 + Names for Applications (IDNA), superseding the earlier version. It 22 + describes the document collection and provides definitions and other 23 + material that are common to the set. 24 + 25 + Status of This Memo 26 + 27 + This is an Internet Standards Track document. 28 + 29 + This document is a product of the Internet Engineering Task Force 30 + (IETF). It represents the consensus of the IETF community. It has 31 + received public review and has been approved for publication by the 32 + Internet Engineering Steering Group (IESG). Further information on 33 + Internet Standards is available in Section 2 of RFC 5741. 34 + 35 + Information about the current status of this document, any errata, 36 + and how to provide feedback on it may be obtained at 37 + http://www.rfc-editor.org/info/rfc5890. 38 + 39 + 40 + 41 + 42 + 43 + 44 + 45 + 46 + 47 + 48 + 49 + 50 + 51 + 52 + 53 + 54 + 55 + 56 + 57 + 58 + Klensin Standards Track [Page 1] 59 + 60 + RFC 5890 IDNA Definitions August 2010 61 + 62 + 63 + Copyright Notice 64 + 65 + Copyright (c) 2010 IETF Trust and the persons identified as the 66 + document authors. All rights reserved. 67 + 68 + This document is subject to BCP 78 and the IETF Trust's Legal 69 + Provisions Relating to IETF Documents 70 + (http://trustee.ietf.org/license-info) in effect on the date of 71 + publication of this document. Please review these documents 72 + carefully, as they describe your rights and restrictions with respect 73 + to this document. Code Components extracted from this document must 74 + include Simplified BSD License text as described in Section 4.e of 75 + the Trust Legal Provisions and are provided without warranty as 76 + described in the Simplified BSD License. 77 + 78 + This document may contain material from IETF Documents or IETF 79 + Contributions published or made publicly available before November 80 + 10, 2008. The person(s) controlling the copyright in some of this 81 + material may not have granted the IETF Trust the right to allow 82 + modifications of such material outside the IETF Standards Process. 83 + Without obtaining an adequate license from the person(s) controlling 84 + the copyright in such materials, this document may not be modified 85 + outside the IETF Standards Process, and derivative works of it may 86 + not be created outside the IETF Standards Process, except to format 87 + it for publication as an RFC or to translate it into languages other 88 + than English. 89 + 90 + 91 + 92 + 93 + 94 + 95 + 96 + 97 + 98 + 99 + 100 + 101 + 102 + 103 + 104 + 105 + 106 + 107 + 108 + 109 + 110 + 111 + 112 + 113 + 114 + Klensin Standards Track [Page 2] 115 + 116 + RFC 5890 IDNA Definitions August 2010 117 + 118 + 119 + Table of Contents 120 + 121 + 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 122 + 1.1. IDNA2008 . . . . . . . . . . . . . . . . . . . . . . . . . 4 123 + 1.1.1. Audiences . . . . . . . . . . . . . . . . . . . . . . 4 124 + 1.1.2. Normative Language . . . . . . . . . . . . . . . . . . 5 125 + 1.2. Road Map of IDNA2008 Documents . . . . . . . . . . . . . . 5 126 + 2. Definitions and Terminology . . . . . . . . . . . . . . . . . 6 127 + 2.1. Characters and Character Sets . . . . . . . . . . . . . . 6 128 + 2.2. DNS-Related Terminology . . . . . . . . . . . . . . . . . 6 129 + 2.3. Terminology Specific to IDNA . . . . . . . . . . . . . . . 7 130 + 2.3.1. LDH Label . . . . . . . . . . . . . . . . . . . . . . 7 131 + 2.3.2. Terms for IDN Label Codings . . . . . . . . . . . . . 11 132 + 2.3.2.1. IDNA-valid strings, A-label, and U-label . . . . . 11 133 + 2.3.2.2. NR-LDH Label . . . . . . . . . . . . . . . . . . . 13 134 + 2.3.2.3. Internationalized Domain Name and 135 + Internationalized Label . . . . . . . . . . . . . 13 136 + 2.3.2.4. Label Equivalence . . . . . . . . . . . . . . . . 14 137 + 2.3.2.5. ACE Prefix . . . . . . . . . . . . . . . . . . . . 14 138 + 2.3.2.6. Domain Name Slot . . . . . . . . . . . . . . . . . 14 139 + 2.3.3. Order of Characters in Labels . . . . . . . . . . . . 15 140 + 2.3.4. Punycode is an Algorithm, Not a Name or Adjective . . 15 141 + 3. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 142 + 4. Security Considerations . . . . . . . . . . . . . . . . . . . 16 143 + 4.1. General Issues . . . . . . . . . . . . . . . . . . . . . . 16 144 + 4.2. U-label Lengths . . . . . . . . . . . . . . . . . . . . . 16 145 + 4.3. Local Character Set Issues . . . . . . . . . . . . . . . . 17 146 + 4.4. Visually Similar Characters . . . . . . . . . . . . . . . 17 147 + 4.5. IDNA Lookup, Registration, and the Base DNS 148 + Specifications . . . . . . . . . . . . . . . . . . . . . . 18 149 + 4.6. Legacy IDN Label Strings . . . . . . . . . . . . . . . . . 18 150 + 4.7. Security Differences from IDNA2003 . . . . . . . . . . . . 19 151 + 4.8. Summary . . . . . . . . . . . . . . . . . . . . . . . . . 20 152 + 5. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 20 153 + 6. References . . . . . . . . . . . . . . . . . . . . . . . . . . 20 154 + 6.1. Normative References . . . . . . . . . . . . . . . . . . . 20 155 + 6.2. Informative References . . . . . . . . . . . . . . . . . . 21 156 + 157 + 158 + 159 + 160 + 161 + 162 + 163 + 164 + 165 + 166 + 167 + 168 + 169 + 170 + Klensin Standards Track [Page 3] 171 + 172 + RFC 5890 IDNA Definitions August 2010 173 + 174 + 175 + 1. Introduction 176 + 177 + 1.1. IDNA2008 178 + 179 + This document is one of a collection that, together, describe the 180 + protocol and usage context for a revision of Internationalized Domain 181 + Names for Applications (IDNA) that was largely completed in 2008, 182 + known within the series and elsewhere as "IDNA2008". The series 183 + replaces an earlier version of IDNA [RFC3490] [RFC3491]. For 184 + convenience, that version of IDNA is referred to in these documents 185 + as "IDNA2003". The newer version continues to use the Punycode 186 + algorithm [RFC3492] and ACE (ASCII-compatible encoding) prefix from 187 + that earlier version. The document collection is described in 188 + Section 1.2. As indicated there, this document provides definitions 189 + and other material that are common to the set. 190 + 191 + 1.1.1. Audiences 192 + 193 + While many IETF specifications are directed exclusively to protocol 194 + implementers, the character of IDNA requires that it be understood 195 + and properly used by those whose responsibilities include making 196 + decisions about: 197 + 198 + o what names are permitted in DNS zone files, 199 + 200 + o policies related to names and naming, and 201 + 202 + o the handling of domain name strings in files and systems, even 203 + with no immediate intention of looking them up. 204 + 205 + This document and those documents concerned with the protocol 206 + definition, rules for handling strings that include characters 207 + written right to left, and the actual list of characters and 208 + categories will be of primary interest to protocol implementers. 209 + This document and the one containing explanatory material will be of 210 + primary interest to others, although they may have to fill in some 211 + details by reference to other documents in the set. 212 + 213 + This document and the associated ones are written from the 214 + perspective of an IDNA-aware user, application, or implementation. 215 + While they may reiterate fundamental DNS rules and requirements for 216 + the convenience of the reader, they make no attempt to be 217 + comprehensive about DNS principles and should not be considered as a 218 + substitute for a thorough understanding of the DNS protocols and 219 + specifications. 220 + 221 + 222 + 223 + 224 + 225 + 226 + Klensin Standards Track [Page 4] 227 + 228 + RFC 5890 IDNA Definitions August 2010 229 + 230 + 231 + 1.1.2. Normative Language 232 + 233 + The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 234 + "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 235 + document are to be interpreted as described in RFC 2119 [RFC2119]. 236 + 237 + 1.2. Road Map of IDNA2008 Documents 238 + 239 + IDNA2008 consists of the following documents: 240 + 241 + o This document, containing definitions and other material that are 242 + needed for understanding other documents in the set. It is 243 + referred to informally in other documents in the set as "Defs" or 244 + "Definitions". 245 + 246 + o A document, RFC 5894 [RFC5894], that provides an overview of the 247 + protocol and associated tables together with explanatory material 248 + and some rationale for the decisions that led to IDNA2008. That 249 + document also contains advice for registry operations and those 250 + who use Internationalized Domain Names (IDNs). It is referred to 251 + informally in other documents in the set as "Rationale". It is 252 + not normative. 253 + 254 + o A document, RFC 5891 [RFC5891], that describes the core IDNA2008 255 + protocol and its operations. In combination with the Bidi 256 + document, described immediately below, it explicitly updates and 257 + replaces RFC 3490. It is referred to informally in other 258 + documents in the set as "Protocol". 259 + 260 + o A document, RFC 5893 [RFC5893], that specifies special rules 261 + (Bidi) for labels that contain characters that are written from 262 + right to left. 263 + 264 + o A specification, RFC 5892 [RFC5892], of the categories and rules 265 + that identify the code points allowed in a label written in native 266 + character form (defined more specifically as a "U-label" in 267 + Section 2.3.2.1 below), based on Unicode 5.2 [Unicode52] code 268 + point assignments and additional rules unique to IDNA2008. The 269 + Unicode-based rules are expected to be stable across Unicode 270 + updates and hence independent of Unicode versions. That 271 + specification obsoletes RFC 3941 and IDN use of the tables to 272 + which it refers. It is referred to informally in other documents 273 + in the set as "Tables". 274 + 275 + 276 + 277 + 278 + 279 + 280 + 281 + 282 + Klensin Standards Track [Page 5] 283 + 284 + RFC 5890 IDNA Definitions August 2010 285 + 286 + 287 + o A document [IDNA2008-Mapping] that discusses the issue of mapping 288 + characters into other characters and that provides guidance for 289 + doing so when that is appropriate. That document, referred to 290 + informally as "Mapping", provides advice; it is not a required 291 + part of IDNA. 292 + 293 + 2. Definitions and Terminology 294 + 295 + 2.1. Characters and Character Sets 296 + 297 + A code point is an integer value in the codespace of a coded 298 + character set. In Unicode, these are integers from 0 to 0x10FFFF. 299 + 300 + Unicode [Unicode52] is a coded character set containing somewhat over 301 + 100,000 characters assigned to code points as of version 5.2. A 302 + single Unicode code point is denoted in these documents by "U+" 303 + followed by four to six hexadecimal digits, while a range of Unicode 304 + code points is denoted by two four to six digit hexadecimal numbers 305 + separated by "..", with no prefixes. 306 + 307 + ASCII means US-ASCII [ASCII], a coded character set containing 128 308 + characters associated with code points in the range 0000..007F. 309 + Unicode is a superset of ASCII and may be thought of as a 310 + generalization of it; it includes all the ASCII characters and 311 + associates them with the equivalent code points. 312 + 313 + "Letters" are, informally, generalizations from the ASCII and 314 + common-sense understanding of that term, i.e., characters that are 315 + used to write text and that are not digits, symbols, or punctuation. 316 + Formally, they are characters with a Unicode General Category value 317 + starting in "L" (see Section 4.5 of The Unicode Standard 318 + [Unicode52]). 319 + 320 + 2.2. DNS-Related Terminology 321 + 322 + When discussing the DNS, this document generally assumes the 323 + terminology used in the DNS specifications [RFC1034] [RFC1035] as 324 + subsequently modified [RFC1123] [RFC2181]. The term "lookup" is used 325 + to describe the combination of operations performed by the IDNA2008 326 + protocol and those actually performed by a DNS resolver. The process 327 + of placing an entry into the DNS is referred to as "registration". 328 + This is similar to common contemporary usage of that term in other 329 + contexts. Consequently, any DNS zone administration is described as 330 + a "registry", and the terms "registry" and "zone administrator" are 331 + used interchangeably, regardless of the actual administrative 332 + arrangements or level in the DNS tree. More details about that 333 + relationship are included in the Rationale document. 334 + 335 + 336 + 337 + 338 + Klensin Standards Track [Page 6] 339 + 340 + RFC 5890 IDNA Definitions August 2010 341 + 342 + 343 + The term "LDH code point" is defined in this document to refer to the 344 + code points associated with ASCII letters (Unicode code points 345 + 0041..005A and 0061..007A), digits (0030..0039), and the hyphen-minus 346 + (U+002D). "LDH" is an abbreviation for "letters, digits, hyphen" but 347 + is used specifically in this document to refer to the set of naming 348 + rules described in Section 2.3.1 below. 349 + 350 + The base DNS specifications [RFC1034] [RFC1035] discuss "domain 351 + names" and "hostnames", but many people use the terms 352 + interchangeably, as do sections of these specifications. Lack of 353 + clarity about that terminology has contributed to confusion about 354 + intent in some cases. These documents generally use the term "domain 355 + name". When they refer to, e.g., hostname syntax restrictions, they 356 + explicitly cite the relevant defining documents. The remaining 357 + definitions in this subsection are essentially a review: if there is 358 + any perceived difference between those definitions and the 359 + definitions in the base DNS documents or those cited below, the 360 + definitions in the other documents take precedence. 361 + 362 + A label is an individual component of a domain name. Labels are 363 + usually shown separated by dots; for example, the domain name 364 + "www.example.com" is composed of three labels: "www", "example", and 365 + "com". (The complete name convention using a trailing dot described 366 + in RFC 1123 [RFC1123], which can be explicit as in "www.example.com." 367 + or implicit as in "www.example.com", is not considered in this 368 + specification.) IDNA extends the set of usable characters in labels 369 + that are treated as text (as distinct from the binary string labels 370 + discussed in RFC 1035 and RFC 2181 [RFC2181] and bitstring ones 371 + [RFC2673]), but only in certain contexts. The different contexts for 372 + different sets of usable characters are outlined in the next section. 373 + For the rest of this document and in the related ones, the term 374 + "label" is shorthand for "text label", and "every label" means "every 375 + text label", including the expanded context. 376 + 377 + 2.3. Terminology Specific to IDNA 378 + 379 + This section defines some terminology to reduce dependence on terms 380 + and definitions that have been problematic in the past. The 381 + relationships among these definitions are illustrated in Figure 1 and 382 + Figure 2. In the first of those figures, the parenthesized numbers 383 + refer to the notes below the figure. 384 + 385 + 2.3.1. LDH Label 386 + 387 + This is the classical label form used, albeit with some additional 388 + restrictions, in hostnames [RFC0952]. Its syntax is identical to 389 + that described as the "preferred name syntax" in Section 3.5 of RFC 390 + 1034 [RFC1034] as modified by RFC 1123 [RFC1123]. Briefly, it is a 391 + 392 + 393 + 394 + Klensin Standards Track [Page 7] 395 + 396 + RFC 5890 IDNA Definitions August 2010 397 + 398 + 399 + string consisting of ASCII letters, digits, and the hyphen with the 400 + further restriction that the hyphen cannot appear at the beginning or 401 + end of the string. Like all DNS labels, its total length must not 402 + exceed 63 octets. 403 + 404 + LDH labels include the specialized labels used by IDNA (described as 405 + "A-labels" below) and some additional restricted forms (also 406 + described below). 407 + 408 + To facilitate clear description, two new subsets of LDH labels are 409 + created by the introduction of IDNA. These are called Reserved LDH 410 + labels (R-LDH labels) and Non-Reserved LDH labels (NR-LDH labels). 411 + Reserved LDH labels, known as "tagged domain names" in some other 412 + contexts, have the property that they contain "--" in the third and 413 + fourth characters but which otherwise conform to LDH label rules. 414 + Only a subset of the R-LDH labels can be used in IDNA-aware 415 + applications. That subset consists of the class of labels that begin 416 + with the prefix "xn--" (case independent), but otherwise conform to 417 + the rules for LDH labels. That subset is called "XN-labels" in this 418 + set of documents. XN-labels are further divided into those whose 419 + remaining characters (after the "xn--") are valid output of the 420 + Punycode algorithm [RFC3492] and those that are not (see below). The 421 + XN-labels that are valid Punycode output are known as "A-labels" if 422 + they also meet the other criteria for IDNA-validity described below. 423 + Because LDH labels (and, indeed, any DNS label) must not be more than 424 + 63 octets in length, the portion of an XN-label derived from the 425 + Punycode algorithm is limited to no more than 59 ASCII characters. 426 + Non-Reserved LDH labels are the set of valid LDH labels that do not 427 + have "--" in the third and fourth positions. 428 + 429 + A consequence of the restrictions on valid characters in the native 430 + Unicode character form (see U-labels) turns out to be that mixed-case 431 + annotation, of the sort outlined in Appendix A of RFC 3492 [RFC3492], 432 + is never useful. Therefore, since a valid A-label is the result of 433 + Punycode encoding of a U-label, A-labels should be produced only in 434 + lowercase, despite matching other (mixed-case or uppercase) potential 435 + labels in the DNS. 436 + 437 + Some strings that are prefixed with "xn--" to form labels may not be 438 + the output of the Punycode algorithm, may fail the other tests 439 + outlined below, or may violate other IDNA restrictions and thus are 440 + also not valid IDNA labels. They are called "Fake A-labels" for 441 + convenience. 442 + 443 + Labels within the class of R-LDH labels that are not prefixed with 444 + "xn--" are also not valid IDNA labels. To allow for future use of 445 + mechanisms similar to IDNA, those labels MUST NOT be processed as 446 + 447 + 448 + 449 + 450 + Klensin Standards Track [Page 8] 451 + 452 + RFC 5890 IDNA Definitions August 2010 453 + 454 + 455 + ordinary LDH labels by IDNA-conforming programs and SHOULD NOT be 456 + mixed with IDNA labels in the same zone. 457 + 458 + These distinctions among possible LDH labels are only of significance 459 + for software that is IDNA-aware or for future extensions that use 460 + extensions based on the same "prefix and encoding" model. For 461 + IDNA-aware systems, the valid label types are: A-labels, U-labels, 462 + and NR-LDH labels. 463 + 464 + IDNA labels come in two flavors: an ACE-encoded form and a Unicode 465 + (native character) form. These are referred to as A-labels and 466 + U-labels, respectively, and are described in detail in the next 467 + section. 468 + 469 + 470 + 471 + 472 + 473 + 474 + 475 + 476 + 477 + 478 + 479 + 480 + 481 + 482 + 483 + 484 + 485 + 486 + 487 + 488 + 489 + 490 + 491 + 492 + 493 + 494 + 495 + 496 + 497 + 498 + 499 + 500 + 501 + 502 + 503 + 504 + 505 + 506 + Klensin Standards Track [Page 9] 507 + 508 + RFC 5890 IDNA Definitions August 2010 509 + 510 + 511 + ASCII Label 512 + __________________________________________________________________ 513 + | | 514 + | ____________________ LDH Label (1) (4) ________________ | 515 + | | ___________________________________ | | 516 + | | |IDN Reserved LDH Labels | | | 517 + | | | ("??--") or R-LDH Labels | _______________ | | 518 + | | | _______________________________ | |NON-RESERVED | | | 519 + | | | | XN-labels | | | LDH Labels | | | 520 + | | | | _____________ ___________ | | | (NR-LDH | | | 521 + | | | | | A-labels | | Fake (3) || | | labels) | | | 522 + | | | | | "xn--"(2) | | A-labels || | |_____________| | | 523 + | | | | |___________| |__________|| | | | 524 + | | | |_____________________________| | | | 525 + | | |_________________________________| | | 526 + | |_______________________________________________________| | 527 + | | 528 + | _____________NON-LDH label________ | 529 + | | ______________________ | | 530 + | | | Underscore labels | | | 531 + | | | e.g., _tcp | | | 532 + | | |____________________| | | 533 + | | | Labels with leading| | | 534 + | | | or trailing | | | 535 + | | | hyphens "-abcd" | | | 536 + | | | or "xyz-" | | | 537 + | | | or "-uvw-" | | | 538 + | | |____________________| | | 539 + | | | Labels with other | | | 540 + | | | non-LDH ASCII chars| | | 541 + | | | e.g., #$%_ | | | 542 + | | |____________________| | | 543 + | |________________________________| | 544 + |________________________________________________________________| 545 + 546 + (1) ASCII letters (uppercase and lowercase), digits, 547 + hyphen. Hyphen may not appear in first or last 548 + position. No more than 63 octets. 549 + (2) Note that the string following "xn--" must 550 + be the valid output of the Punycode algorithm 551 + and must be convertible into valid U-label form. 552 + (3) Note that a Fake A-label has a prefix "xn--" 553 + but the remainder of the label is NOT the valid 554 + output of the Punycode algorithm. 555 + (4) LDH label subtypes are indistinguishable to 556 + applications that are not IDNA-aware. 557 + 558 + Figure 1: IDNA and Related DNS Terminology Space -- ASCII Labels 559 + 560 + 561 + 562 + Klensin Standards Track [Page 10] 563 + 564 + RFC 5890 IDNA Definitions August 2010 565 + 566 + 567 + __________________________ 568 + | Non-ASCII | 569 + | | 570 + | ___________________ | 571 + | | U-label (5) | | 572 + | |_________________| | 573 + | | | | 574 + | | Binary Label | | 575 + | | (including | | 576 + | | high bit on) | | 577 + | |_________________| | 578 + | | | | 579 + | | Bit String | | 580 + | | Label | | 581 + | |_________________| | 582 + |________________________| 583 + 584 + (5) To applications that are not IDNA-aware, U-labels 585 + are indistinguishable from Binary ones. 586 + 587 + Figure 2: Non-ASCII Labels 588 + 589 + 2.3.2. Terms for IDN Label Codings 590 + 591 + 2.3.2.1. IDNA-valid strings, A-label, and U-label 592 + 593 + For IDNA-aware applications, the three types of valid labels are 594 + "A-labels", "U-labels", and "NR-LDH labels", each of which is defined 595 + below. The relationships among them are illustrated in Figure 1 and 596 + Figure 2. 597 + 598 + o A string is "IDNA-valid" if it meets all of the requirements of 599 + these specifications for an IDNA label. IDNA-valid strings may 600 + appear in either of the two forms defined immediately below, or 601 + may be drawn from the NR-LDH label subset. IDNA-valid strings 602 + must also conform to all basic DNS requirements for labels. These 603 + documents make specific reference to the form appropriate to any 604 + context in which the distinction is important. 605 + 606 + o An "A-label" is the ASCII-Compatible Encoding (ACE, see 607 + Section 2.3.2.5) form of an IDNA-valid string. It must be a 608 + complete label: IDNA is defined for labels, not for parts of them 609 + and not for complete domain names. This means, by definition, 610 + that every A-label will begin with the IDNA ACE prefix, "xn--" 611 + (see Section 2.3.2.5), followed by a string that is a valid output 612 + of the Punycode algorithm [RFC3492] and hence a maximum of 59 613 + ASCII characters in length. The prefix and string together must 614 + conform to all requirements for a label that can be stored in the 615 + 616 + 617 + 618 + Klensin Standards Track [Page 11] 619 + 620 + RFC 5890 IDNA Definitions August 2010 621 + 622 + 623 + DNS including conformance to the rules for LDH labels 624 + (Section 2.3.1). If and only if a string meeting the above 625 + requirements can be decoded into a U-label is it an A-label. 626 + 627 + o A "U-label" is an IDNA-valid string of Unicode characters, in 628 + Normalization Form C (NFC) and including at least one non-ASCII 629 + character, expressed in a standard Unicode Encoding Form (such as 630 + UTF-8). It is also subject to the constraints about permitted 631 + characters that are specified in Section 4.2 of the Protocol 632 + document and the rules in the Sections 2 and 3 of the Tables 633 + document, the Bidi constraints in that document if it contains any 634 + character from scripts that are written right to left, and the 635 + symmetry constraint described immediately below. Conversions 636 + between U-labels and A-labels are performed according to the 637 + "Punycode" specification [RFC3492], adding or removing the ACE 638 + prefix as needed. 639 + 640 + To be valid, U-labels and A-labels must obey an important symmetry 641 + constraint. While that constraint may be tested in any of several 642 + ways, an A-label A1 must be capable of being produced by conversion 643 + from a U-label U1, and that U-label U1 must be capable of being 644 + produced by conversion from A-label A1. Among other things, this 645 + implies that both U-labels and A-labels must be strings in Unicode 646 + NFC [Unicode-UAX15] normalized form. These strings MUST contain only 647 + characters specified elsewhere in this document series, and only in 648 + the contexts indicated as appropriate. 649 + 650 + Any rules or conventions that apply to DNS labels in general apply to 651 + whichever of the U-label or A-label would be more restrictive. There 652 + are two exceptions to this principle. First, the restriction to 653 + ASCII characters does not apply to the U-label. Second, expansion of 654 + the A-label form to a U-label may produce strings that are much 655 + longer than the normal 63 octet DNS limit (potentially up to 252 656 + characters) due to the compression efficiency of the Punycode 657 + algorithm. Such extended-length U-labels are valid from the 658 + standpoint of IDNA, but caution should be exercised as shorter limits 659 + may be imposed by some applications. 660 + 661 + For context, applications that are not IDNA-aware treat all LDH 662 + labels as valid for appearance in DNS zone files and queries and some 663 + of them may permit additional types of labels (i.e., not impose the 664 + LDH restriction). IDNA-aware applications permit only A-labels and 665 + NR-LDH labels to appear in zone files and queries. U-labels can 666 + appear, along with the other two, in presentation and user interface 667 + forms, and in protocols that use IDNA forms but that do not involve 668 + the DNS itself. 669 + 670 + 671 + 672 + 673 + 674 + Klensin Standards Track [Page 12] 675 + 676 + RFC 5890 IDNA Definitions August 2010 677 + 678 + 679 + Specifically, for IDNA-aware applications and contexts, the three 680 + allowed categories are A-label, U-label, and NR-LDH label. Of the 681 + Reserved LDH labels (R-LDH labels) only A-labels are valid for IDNA 682 + use. 683 + 684 + Strings that appear to be A-labels or U-labels are processed in 685 + various operations of the Protocol document [RFC5891]. Those strings 686 + are not yet demonstrably conformant with the conditions outlined 687 + above because they are in the process of validation. Such strings 688 + may be referred to as "unvalidated", "putative", or "apparent", or as 689 + being "in the form of" one of the label types to indicate that they 690 + have not been verified to meet the specified conformance 691 + requirements. 692 + 693 + Unvalidated A-labels are known only to be XN-labels, while Fake 694 + A-labels have been demonstrated to fail some of the A-label tests. 695 + Similarly, unvalidated U-labels are simply non-ASCII labels that may 696 + or may not meet the requirements for U-labels. 697 + 698 + 2.3.2.2. NR-LDH Label 699 + 700 + These specifications use the term "NR-LDH label" strictly to refer to 701 + an all-ASCII label that obeys the LDH label syntax discussed in 702 + Section 2.3.1 and that is neither an IDN nor a label form reserved by 703 + IDNA (R-LDH label). It should be stressed that all A-labels obey the 704 + "hostname" [RFC0952] rules other than the length restriction in those 705 + rules. 706 + 707 + 2.3.2.3. Internationalized Domain Name and Internationalized Label 708 + 709 + An "internationalized domain name" (IDN) is a domain name that 710 + contains at least one A-label or U-label, but that otherwise may 711 + contain any mixture of NR-LDH labels, A-labels, or U-labels. Just as 712 + has been the case with ASCII names, some DNS zone administrators may 713 + impose restrictions, beyond those imposed by DNS or IDNA, on the 714 + characters or strings that may be registered as labels in their 715 + zones. Because of the diversity of characters that can be used in a 716 + U-label and the confusion they might cause, such restrictions are 717 + mandatory for IDN registries and zones even though the particular 718 + restrictions are not part of these specifications (the issue is 719 + discussed in more detail in Section 4.3 of the Protocol document 720 + [RFC5891]. Because these restrictions, commonly known as "registry 721 + restrictions", only affect what can be registered and not lookup 722 + processing, they have no effect on the syntax or semantics of DNS 723 + protocol messages; a query for a name that matches no records will 724 + yield the same response regardless of the reason why it is not in the 725 + zone. Clients issuing queries or interpreting responses cannot be 726 + 727 + 728 + 729 + 730 + Klensin Standards Track [Page 13] 731 + 732 + RFC 5890 IDNA Definitions August 2010 733 + 734 + 735 + assumed to have any knowledge of zone-specific restrictions or 736 + conventions. See the section on registration policy in the Rationale 737 + document [RFC5894] for additional discussion. 738 + 739 + "Internationalized label" is used when a term is needed to refer to a 740 + single label of an IDN, i.e., one that might be any of an NR-LDH 741 + label, A-label, or U-label. There are some standardized DNS label 742 + formats, such as the "underscore labels" used for service location 743 + (SRV) records [RFC2782], that do not fall into any of the three 744 + categories and hence are not internationalized labels. 745 + 746 + 2.3.2.4. Label Equivalence 747 + 748 + In IDNA, equivalence of labels is defined in terms of the A-labels. 749 + If the A-labels are equal in a case-independent comparison, then the 750 + labels are considered equivalent, no matter how they are represented. 751 + Because of the isomorphism of A-labels and U-labels in IDNA2008, it 752 + is possible to compare U-labels directly; see the Protocol document 753 + [RFC5891] for details. Traditional LDH labels already have a notion 754 + of equivalence: within that list of characters, uppercase and 755 + lowercase are considered equivalent. The IDNA notion of equivalence 756 + is an extension of that older notion but, because the protocol does 757 + not specify any mandatory mapping and only those isomorphic forms are 758 + considered, the only equivalents are: 759 + 760 + o Exact (bit-string identity) matches between a pair of U-labels. 761 + 762 + o Matches between a pair of A-labels, using normal DNS 763 + case-insensitive matching rules. 764 + 765 + o Equivalence between a U-label and an A-label determined by 766 + translating the U-label form into an A-label form and then testing 767 + for a match between the A-labels using normal DNS case-insensitive 768 + matching rules. 769 + 770 + 2.3.2.5. ACE Prefix 771 + 772 + The "ACE prefix" is defined in this document to be a string of ASCII 773 + characters, "xn--", that appears at the beginning of every A-label. 774 + "ACE" stands for "ASCII-Compatible Encoding". 775 + 776 + 2.3.2.6. Domain Name Slot 777 + 778 + A "domain name slot" is defined in this document to be a protocol 779 + element or a function argument or a return value (and so on) 780 + explicitly designated for carrying a domain name. Examples of domain 781 + name slots include the QNAME field of a DNS query; the name argument 782 + of the gethostbyname() or getaddrinfo() standard C library functions; 783 + 784 + 785 + 786 + Klensin Standards Track [Page 14] 787 + 788 + RFC 5890 IDNA Definitions August 2010 789 + 790 + 791 + the part of an email address following the at sign ("@") in the 792 + parameter to the SMTP MAIL or RCPT commands or the "From:" field of 793 + an email message header; and the host portion of the URI in the "src" 794 + attribute of an HTML "<IMG>" tag. A string that has the syntax of a 795 + domain name but that appears in general text is not in a domain name 796 + slot. For example, a domain name appearing in the plain text body of 797 + an email message is not occupying a domain name slot. 798 + 799 + An "IDNA-aware domain name slot" is defined for this set of documents 800 + to be a domain name slot explicitly designated for carrying an 801 + internationalized domain name as defined in this document. The 802 + designation may be static (for example, in the specification of the 803 + protocol or interface) or dynamic (for example, as a result of 804 + negotiation in an interactive session). 805 + 806 + Name slots that are not IDNA-aware obviously include any domain name 807 + slot whose specification predates IDNA. Note that the requirements 808 + of some protocols that use the DNS for data storage prevent the use 809 + of IDNs. For example, the format required for the underscore labels 810 + used by the service location protocol [RFC2782] precludes 811 + representation of a non-ASCII label in the DNS using A-labels because 812 + those SRV-related labels must start with underscores. Of course, 813 + non-ASCII IDN labels may be part of a domain name that also includes 814 + underscore labels. 815 + 816 + 2.3.3. Order of Characters in Labels 817 + 818 + Because IDN labels may contain characters that are read, and 819 + preferentially displayed, from right to left, there is a potential 820 + ambiguity about which character in a label is "first". For the 821 + purposes of these specifications, labels are considered, and 822 + characters numbered, strictly in the order in which they appear "on 823 + the wire". That order is equivalent to the leftmost character being 824 + treated as first in a label that is read left to right and to the 825 + rightmost character being first in a label that is read right to 826 + left. The Bidi specification contains additional discussion of the 827 + conditions that influence reading order. 828 + 829 + 2.3.4. Punycode is an Algorithm, Not a Name or Adjective 830 + 831 + There has been some confusion about whether a "Punycode string" does 832 + or does not include the ACE prefix and about whether it is required 833 + that such strings could have been the output of the ToASCII operation 834 + (see RFC 3490, Section 4 [RFC3490]). This specification discourages 835 + the use of the term "Punycode" to describe anything but the encoding 836 + method and algorithm of RFC 3492 [RFC3492]. The terms defined above 837 + are preferred as much more clear than the term "Punycode string". 838 + 839 + 840 + 841 + 842 + Klensin Standards Track [Page 15] 843 + 844 + RFC 5890 IDNA Definitions August 2010 845 + 846 + 847 + 3. IANA Considerations 848 + 849 + IANA actions for this version of IDNA (IDNA2008) are specified in the 850 + Tables document [RFC5892]. An overview of the relationships among 851 + the various IANA registries appears in the Rationale document 852 + [RFC5894]. This document does not specify any actions for IANA. 853 + 854 + 4. Security Considerations 855 + 856 + 4.1. General Issues 857 + 858 + Security on the Internet partly relies on the DNS. Thus, any change 859 + to the characteristics of the DNS can change the security of much of 860 + the Internet. 861 + 862 + Domain names are used by users to identify and connect to Internet 863 + hosts and other network resources. The security of the Internet is 864 + compromised if a user entering a single internationalized name is 865 + connected to different servers based on different interpretations of 866 + the internationalized domain name. In addition to characters that 867 + are permitted by IDNA2003 and its mapping conventions (see 868 + Section 4.6), the current specification changes the interpretation of 869 + a few characters that were mapped to others in the earlier version; 870 + zone administrators should be aware of the problems that this might 871 + raise and take appropriate measures. The context for this issue is 872 + discussed in more detail in the Rationale document [RFC5894]. 873 + 874 + In addition to the Security Considerations material that appears in 875 + this document, the Bidi document [RFC5893] contains a discussion of 876 + security issues specific to labels containing characters from scripts 877 + that are normally written right to left. 878 + 879 + 4.2. U-label Lengths 880 + 881 + Labels associated with the DNS have traditionally been limited to 63 882 + octets by the general restrictions in RFC 1035 and by the need to 883 + treat them as a six-bit string length followed by the string in 884 + actual calls to the DNS. That format is used in some other 885 + applications and, in general, that representations of domain names as 886 + dot-separated labels and as length-string pairs have been treated as 887 + interchangeable. Because A-labels (the form actually used in the 888 + DNS) are potentially much more compressed than UTF-8 (and UTF-8 is, 889 + in general, more compressed that UTF-16 or UTF-32), U-labels that 890 + obey all of the relevant symmetry (and other) constraints of these 891 + documents may be quite a bit longer, potentially up to 252 characters 892 + (Unicode code points). A fully-qualified domain name containing 893 + several such labels can obviously also exceed the nominal 255 octet 894 + 895 + 896 + 897 + 898 + Klensin Standards Track [Page 16] 899 + 900 + RFC 5890 IDNA Definitions August 2010 901 + 902 + 903 + limit for such names. Application authors using U-labels must exert 904 + due caution to avoid buffer overflow and truncation errors and 905 + attacks in contexts where shorter strings are expected. 906 + 907 + 4.3. Local Character Set Issues 908 + 909 + When systems use local character sets other than ASCII and Unicode, 910 + these specifications leave the problem of converting between the 911 + local character set and Unicode up to the application or local 912 + system. If different applications (or different versions of one 913 + application) implement different rules for conversions among coded 914 + character sets, they could interpret the same name differently and 915 + contact different servers. This problem is not solved by security 916 + protocols, such as Transport Layer Security (TLS) [RFC5246], that do 917 + not take local character sets into account. 918 + 919 + 4.4. Visually Similar Characters 920 + 921 + To help prevent confusion between characters that are visually 922 + similar (sometimes called "confusables"), it is suggested that 923 + implementations provide visual indications where a domain name 924 + contains multiple scripts, especially when the scripts contain 925 + characters that are easily confused visually, such as an omicron in 926 + Greek mixed with Latin text. Such mechanisms can also be used to 927 + show when a name contains a mixture of Simplified Chinese characters 928 + with Traditional ones that have Simplified forms, or to distinguish 929 + zero and one from uppercase "O" and lowercase "L". DNS zone 930 + administrators may impose restrictions (subject to the limitations 931 + identified elsewhere in these documents) that try to minimize 932 + characters that have similar appearance or similar interpretations. 933 + 934 + If multiple characters appear in a label and the label consists only 935 + of characters in one script, individual characters that might be 936 + confused with others if compared separately may be unambiguous and 937 + non-confusing. On the other hand, that observation makes labels 938 + containing characters from more than one script (often called "mixed- 939 + script labels") even more risky -- users will tend to see what they 940 + expect to see and context is a powerful reinforcement to perception. 941 + At the same time, while the risks associated with mixed-script labels 942 + are clear, simply prohibiting them will not eliminate problems, 943 + especially where closely related scripts are involved. For example, 944 + there are many strings that are entirely in Greek or Cyrillic scripts 945 + that can be confused with each other or with Latin script strings. 946 + 947 + It is worth noting that there are no comprehensive technical 948 + solutions to the problems of confusable characters. One can reduce 949 + the extent of the problems in various ways, but probably never 950 + 951 + 952 + 953 + 954 + Klensin Standards Track [Page 17] 955 + 956 + RFC 5890 IDNA Definitions August 2010 957 + 958 + 959 + eliminate it. Some specific suggestions about identification and 960 + handling of confusable characters appear in a Unicode Consortium 961 + publication [Unicode-UTR36]. 962 + 963 + 4.5. IDNA Lookup, Registration, and the Base DNS Specifications 964 + 965 + The Protocol specification [RFC5891] describes procedures for 966 + registering and looking up labels that are not compatible with the 967 + preferred syntax described in the base DNS specifications (see 968 + Section 2.3.1) because they contain non-ASCII characters. These 969 + procedures depend on the use of a special ASCII-compatible encoding 970 + form that contains only characters permitted in hostnames by those 971 + earlier specifications. The encoding used is Punycode [RFC3492]. No 972 + security issues such as string length increases or new allowed values 973 + are introduced by the encoding process or the use of these encoded 974 + values, apart from those introduced by the ACE encoding itself. 975 + 976 + Domain names (or portions of them) are sometimes compared against a 977 + set of domains to be given special treatment if a match occurs, e.g., 978 + treated as more privileged than others or blocked in some way. In 979 + such situations, it is especially important that the comparisons be 980 + done properly, as specified in the "Requirements" section of the 981 + Protocol document [RFC5891]. For labels already in ASCII form, the 982 + proper comparison reduces to the same case-insensitive ASCII 983 + comparison that has always been used for ASCII labels although 984 + IDNA-aware applications are expected to look up only A-labels and 985 + NR-LDH labels, i.e., to avoid looking up R-LDH labels that are not 986 + A-labels. 987 + 988 + The introduction of IDNA meant that any existing labels that start 989 + with the ACE prefix would be construed as A-labels, at least until 990 + they failed one of the relevant tests, whether or not that was the 991 + intent of the zone administrator or registrant. There is no evidence 992 + that this has caused any practical problems since RFC 3490 was 993 + adopted, but the risk still exists in principle. 994 + 995 + 4.6. Legacy IDN Label Strings 996 + 997 + The URI Standard [RFC3986] and a number of application specifications 998 + (e.g., SMTP [RFC5321] and HTTP [RFC2616]) do not permit non-ASCII 999 + labels in DNS names used with those protocols, i.e., only the A-label 1000 + form of IDNs is permitted in those contexts. If only A-labels are 1001 + used, differences in interpretation between IDNA2003 and this version 1002 + arise only for characters whose interpretation have actually changed 1003 + (e.g., characters, such as ZWJ and ZWNJ, that were mapped to nothing 1004 + in IDNA2003 and that are considered legitimate in some contexts by 1005 + these specifications). Despite that prohibition, there are a 1006 + significant number of files and databases on the Internet in which 1007 + 1008 + 1009 + 1010 + Klensin Standards Track [Page 18] 1011 + 1012 + RFC 5890 IDNA Definitions August 2010 1013 + 1014 + 1015 + domain name strings appear in native-character form; a subset of 1016 + those strings use native-character labels that require IDNA2003 1017 + mapping to produce valid A-labels. The treatment of such labels will 1018 + vary by types of applications and application-designer preference: in 1019 + some situations, warnings to the user or outright rejection may be 1020 + appropriate; in others, it may be preferable to attempt to apply the 1021 + earlier mappings if lookup strictly conformant to these 1022 + specifications fails or even to do lookups under both sets of rules. 1023 + This general situation is discussed in more detail in the Rationale 1024 + document [RFC5894]. However, in the absence of care by registries 1025 + about how strings that could have different interpretations under 1026 + IDNA2003 and the current specification are handled, it is possible 1027 + that the differences could be used as a component of name-matching or 1028 + name-confusion attacks. Such care is therefore appropriate. 1029 + 1030 + 4.7. Security Differences from IDNA2003 1031 + 1032 + The registration and lookup models described in this set of documents 1033 + change the mechanisms available for lookup applications to determine 1034 + the validity of labels they encounter. In some respects, the ability 1035 + to test is strengthened. For example, putative labels that contain 1036 + unassigned code points will now be rejected, while IDNA2003 permitted 1037 + them (see the Rationale document [RFC5894] for a discussion of the 1038 + reasons for this). On the other hand, the Protocol specification no 1039 + longer assumes that the application that looks up a name will be able 1040 + to determine, and apply, information about the protocol version used 1041 + in registration. In theory, that may increase risk since the 1042 + application will be able to do less pre-lookup validation. In 1043 + practice, the protection afforded by that test has been largely 1044 + illusory for reasons explained in RFC 4690 [RFC4690] and elsewhere in 1045 + these documents. 1046 + 1047 + Any change to the Stringprep [RFC3454] procedure that is profiled and 1048 + used in IDNA2003, or, more broadly, the IETF's model of the use of 1049 + internationalized character strings in different protocols, creates 1050 + some risk of inadvertent changes to those protocols, invalidating 1051 + deployed applications or databases, and so on. But these 1052 + specifications do not change Stringprep at all; they merely bypass 1053 + it. Because these documents do not depend on Stringprep, the 1054 + question of upgrading other protocols that do have that dependency 1055 + can be left to experts on those protocols: the IDNA changes and 1056 + possible upgrades to security protocols or conventions are 1057 + independent issues. 1058 + 1059 + 1060 + 1061 + 1062 + 1063 + 1064 + 1065 + 1066 + Klensin Standards Track [Page 19] 1067 + 1068 + RFC 5890 IDNA Definitions August 2010 1069 + 1070 + 1071 + 4.8. Summary 1072 + 1073 + No mechanism involving names or identifiers alone can protect against 1074 + a wide variety of security threats and attacks that are largely 1075 + independent of the naming or identification system. These attacks 1076 + include spoofed pages, DNS query trapping and diversion, and so on. 1077 + 1078 + 5. Acknowledgments 1079 + 1080 + The initial version of this document was created largely by 1081 + extracting text from early draft versions of the Rationale document 1082 + [RFC5894]. See the section of this name and the one entitled 1083 + "Contributors", in it. 1084 + 1085 + Specific textual suggestions after the extraction process came from 1086 + Vint Cerf, Lisa Dusseault, Bill McQuillan, Andrew Sullivan, and Ken 1087 + Whistler. Other changes were made in response to more general 1088 + comments, lists of concerns or specific errors from participants in 1089 + the Working Group and other observers, including Lyman Chapin, James 1090 + Mitchell, Subramanian Moonesamy, and Dan Winship. 1091 + 1092 + 6. References 1093 + 1094 + 6.1. Normative References 1095 + 1096 + [ASCII] American National Standards Institute (formerly United 1097 + States of America Standards Institute), "USA Code for 1098 + Information Interchange", ANSI X3.4-1968, 1968. ANSI 1099 + X3.4-1968 has been replaced by newer versions with 1100 + slight modifications, but the 1968 version remains 1101 + definitive for the Internet. 1102 + 1103 + [RFC1034] Mockapetris, P., "Domain names - concepts and 1104 + facilities", STD 13, RFC 1034, November 1987. 1105 + 1106 + [RFC1035] Mockapetris, P., "Domain names - implementation and 1107 + specification", STD 13, RFC 1035, November 1987. 1108 + 1109 + [RFC1123] Braden, R., "Requirements for Internet Hosts - 1110 + Application and Support", STD 3, RFC 1123, October 1989. 1111 + 1112 + [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1113 + Requirement Levels", BCP 14, RFC 2119, March 1997. 1114 + 1115 + 1116 + 1117 + 1118 + 1119 + 1120 + 1121 + 1122 + Klensin Standards Track [Page 20] 1123 + 1124 + RFC 5890 IDNA Definitions August 2010 1125 + 1126 + 1127 + [Unicode-UAX15] 1128 + The Unicode Consortium, "Unicode Standard Annex #15: 1129 + Unicode Normalization Forms, Revision 31", 1130 + September 2009, 1131 + <http://www.unicode.org/reports/tr15/tr15-31.html>. 1132 + 1133 + [Unicode52] The Unicode Consortium. The Unicode Standard, Version 1134 + 5.2.0, defined by: "The Unicode Standard, Version 1135 + 5.2.0", (Mountain View, CA: The Unicode Consortium, 1136 + 2009. ISBN 978-1-936213-00-9). 1137 + <http://www.unicode.org/versions/Unicode5.2.0/>. 1138 + 1139 + 6.2. Informative References 1140 + 1141 + [IDNA2008-Mapping] 1142 + Resnick, P. and P. Hoffman, "Mapping Characters in 1143 + Internationalized Domain Names for Applications (IDNA)", 1144 + Work in Progress, April 2010. 1145 + 1146 + [RFC0952] Harrenstien, K., Stahl, M., and E. Feinler, "DoD 1147 + Internet host table specification", RFC 952, 1148 + October 1985. 1149 + 1150 + [RFC2181] Elz, R. and R. Bush, "Clarifications to the DNS 1151 + Specification", RFC 2181, July 1997. 1152 + 1153 + [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., 1154 + Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext 1155 + Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. 1156 + 1157 + [RFC2673] Crawford, M., "Binary Labels in the Domain Name System", 1158 + RFC 2673, August 1999. 1159 + 1160 + [RFC2782] Gulbrandsen, A., Vixie, P., and L. Esibov, "A DNS RR for 1161 + specifying the location of services (DNS SRV)", 1162 + RFC 2782, February 2000. 1163 + 1164 + [RFC3454] Hoffman, P. and M. Blanchet, "Preparation of 1165 + Internationalized Strings ("stringprep")", RFC 3454, 1166 + December 2002. 1167 + 1168 + [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, 1169 + "Internationalizing Domain Names in Applications 1170 + (IDNA)", RFC 3490, March 2003. 1171 + 1172 + [RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep 1173 + Profile for Internationalized Domain Names (IDN)", 1174 + RFC 3491, March 2003. 1175 + 1176 + 1177 + 1178 + Klensin Standards Track [Page 21] 1179 + 1180 + RFC 5890 IDNA Definitions August 2010 1181 + 1182 + 1183 + [RFC3492] Costello, A., "Punycode: A Bootstring encoding of 1184 + Unicode for Internationalized Domain Names in 1185 + Applications (IDNA)", RFC 3492, March 2003. 1186 + 1187 + [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 1188 + Resource Identifier (URI): Generic Syntax", STD 66, 1189 + RFC 3986, January 2005. 1190 + 1191 + [RFC4690] Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review 1192 + and Recommendations for Internationalized Domain Names 1193 + (IDNs)", RFC 4690, September 2006. 1194 + 1195 + [RFC5246] Dierks, T. and E. Rescorla, "The Transport Layer 1196 + Security (TLS) Protocol Version 1.2", RFC 5246, 1197 + August 2008. 1198 + 1199 + [RFC5321] Klensin, J., "Simple Mail Transfer Protocol", RFC 5321, 1200 + October 2008. 1201 + 1202 + [RFC5891] Klensin, J., "Internationalized Domain Names in 1203 + Applications (IDNA): Protocol", RFC 5891, August 2010. 1204 + 1205 + [RFC5892] Faltstrom, P., Ed., "The Unicode Code Points and 1206 + Internationalized Domain Names for Applications (IDNA)", 1207 + RFC 5892, August 2010. 1208 + 1209 + [RFC5893] Alvestrand, H. and C. Karp, "Right-to-Left Scripts for 1210 + Internationalized Domain Names for Applications (IDNA)", 1211 + RFC 5893, August 2010. 1212 + 1213 + [RFC5894] Klensin, J., "Internationalized Domain Names for 1214 + Applications (IDNA): Background, Explanation, and 1215 + Rationale", RFC 5894, August 2010. 1216 + 1217 + [Unicode-UTR36] 1218 + The Unicode Consortium, "Unicode Technical Report #36: 1219 + Unicode Security Considerations, Revision 7", July 2008, 1220 + <http://www.unicode.org/reports/tr36/tr36-7.html>. 1221 + 1222 + 1223 + 1224 + 1225 + 1226 + 1227 + 1228 + 1229 + 1230 + 1231 + 1232 + 1233 + 1234 + Klensin Standards Track [Page 22] 1235 + 1236 + RFC 5890 IDNA Definitions August 2010 1237 + 1238 + 1239 + Author's Address 1240 + 1241 + John C Klensin 1242 + 1770 Massachusetts Ave, Ste 322 1243 + Cambridge, MA 02140 1244 + USA 1245 + 1246 + Phone: +1 617 245 1457 1247 + EMail: john+ietf@jck.com 1248 + 1249 + 1250 + 1251 + 1252 + 1253 + 1254 + 1255 + 1256 + 1257 + 1258 + 1259 + 1260 + 1261 + 1262 + 1263 + 1264 + 1265 + 1266 + 1267 + 1268 + 1269 + 1270 + 1271 + 1272 + 1273 + 1274 + 1275 + 1276 + 1277 + 1278 + 1279 + 1280 + 1281 + 1282 + 1283 + 1284 + 1285 + 1286 + 1287 + 1288 + 1289 + 1290 + Klensin Standards Track [Page 23] 1291 +
+955
spec/rfc5891.txt
··· 1 + 2 + 3 + 4 + 5 + 6 + 7 + Internet Engineering Task Force (IETF) J. Klensin 8 + Request for Comments: 5891 August 2010 9 + Obsoletes: 3490, 3491 10 + Updates: 3492 11 + Category: Standards Track 12 + ISSN: 2070-1721 13 + 14 + 15 + Internationalized Domain Names in Applications (IDNA): Protocol 16 + 17 + Abstract 18 + 19 + This document is the revised protocol definition for 20 + Internationalized Domain Names (IDNs). The rationale for changes, 21 + the relationship to the older specification, and important 22 + terminology are provided in other documents. This document specifies 23 + the protocol mechanism, called Internationalized Domain Names in 24 + Applications (IDNA), for registering and looking up IDNs in a way 25 + that does not require changes to the DNS itself. IDNA is only meant 26 + for processing domain names, not free text. 27 + 28 + Status of This Memo 29 + 30 + This is an Internet Standards Track document. 31 + 32 + This document is a product of the Internet Engineering Task Force 33 + (IETF). It represents the consensus of the IETF community. It has 34 + received public review and has been approved for publication by the 35 + Internet Engineering Steering Group (IESG). Further information on 36 + Internet Standards is available in Section 2 of RFC 5741. 37 + 38 + Information about the current status of this document, any errata, 39 + and how to provide feedback on it may be obtained at 40 + http://www.rfc-editor.org/info/rfc5891. 41 + 42 + 43 + 44 + 45 + 46 + 47 + 48 + 49 + 50 + 51 + 52 + 53 + 54 + 55 + 56 + 57 + 58 + Klensin Standards Track [Page 1] 59 + 60 + RFC 5891 IDNA2008 Protocol August 2010 61 + 62 + 63 + Copyright Notice 64 + 65 + Copyright (c) 2010 IETF Trust and the persons identified as the 66 + document authors. All rights reserved. 67 + 68 + This document is subject to BCP 78 and the IETF Trust's Legal 69 + Provisions Relating to IETF Documents 70 + (http://trustee.ietf.org/license-info) in effect on the date of 71 + publication of this document. Please review these documents 72 + carefully, as they describe your rights and restrictions with respect 73 + to this document. Code Components extracted from this document must 74 + include Simplified BSD License text as described in Section 4.e of 75 + the Trust Legal Provisions and are provided without warranty as 76 + described in the Simplified BSD License. 77 + 78 + This document may contain material from IETF Documents or IETF 79 + Contributions published or made publicly available before November 80 + 10, 2008. The person(s) controlling the copyright in some of this 81 + material may not have granted the IETF Trust the right to allow 82 + modifications of such material outside the IETF Standards Process. 83 + Without obtaining an adequate license from the person(s) controlling 84 + the copyright in such materials, this document may not be modified 85 + outside the IETF Standards Process, and derivative works of it may 86 + not be created outside the IETF Standards Process, except to format 87 + it for publication as an RFC or to translate it into languages other 88 + than English. 89 + 90 + 91 + 92 + 93 + 94 + 95 + 96 + 97 + 98 + 99 + 100 + 101 + 102 + 103 + 104 + 105 + 106 + 107 + 108 + 109 + 110 + 111 + 112 + 113 + 114 + Klensin Standards Track [Page 2] 115 + 116 + RFC 5891 IDNA2008 Protocol August 2010 117 + 118 + 119 + Table of Contents 120 + 121 + 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 122 + 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 123 + 3. Requirements and Applicability . . . . . . . . . . . . . . . . 5 124 + 3.1. Requirements . . . . . . . . . . . . . . . . . . . . . . . 5 125 + 3.2. Applicability . . . . . . . . . . . . . . . . . . . . . . 5 126 + 3.2.1. DNS Resource Records . . . . . . . . . . . . . . . . . 6 127 + 3.2.2. Non-Domain-Name Data Types Stored in the DNS . . . . . 6 128 + 4. Registration Protocol . . . . . . . . . . . . . . . . . . . . 6 129 + 4.1. Input to IDNA Registration . . . . . . . . . . . . . . . . 7 130 + 4.2. Permitted Character and Label Validation . . . . . . . . . 7 131 + 4.2.1. Input Format . . . . . . . . . . . . . . . . . . . . . 7 132 + 4.2.2. Rejection of Characters That Are Not Permitted . . . . 8 133 + 4.2.3. Label Validation . . . . . . . . . . . . . . . . . . . 8 134 + 4.2.4. Registration Validation Requirements . . . . . . . . . 9 135 + 4.3. Registry Restrictions . . . . . . . . . . . . . . . . . . 9 136 + 4.4. Punycode Conversion . . . . . . . . . . . . . . . . . . . 9 137 + 4.5. Insertion in the Zone . . . . . . . . . . . . . . . . . . 10 138 + 5. Domain Name Lookup Protocol . . . . . . . . . . . . . . . . . 10 139 + 5.1. Label String Input . . . . . . . . . . . . . . . . . . . . 10 140 + 5.2. Conversion to Unicode . . . . . . . . . . . . . . . . . . 10 141 + 5.3. A-label Input . . . . . . . . . . . . . . . . . . . . . . 10 142 + 5.4. Validation and Character List Testing . . . . . . . . . . 11 143 + 5.5. Punycode Conversion . . . . . . . . . . . . . . . . . . . 13 144 + 5.6. DNS Name Resolution . . . . . . . . . . . . . . . . . . . 13 145 + 6. Security Considerations . . . . . . . . . . . . . . . . . . . 13 146 + 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 147 + 8. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 13 148 + 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 14 149 + 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 14 150 + 10.1. Normative References . . . . . . . . . . . . . . . . . . . 14 151 + 10.2. Informative References . . . . . . . . . . . . . . . . . . 15 152 + Appendix A. Summary of Major Changes from IDNA2003 . . . . . . . 17 153 + 154 + 155 + 156 + 157 + 158 + 159 + 160 + 161 + 162 + 163 + 164 + 165 + 166 + 167 + 168 + 169 + 170 + Klensin Standards Track [Page 3] 171 + 172 + RFC 5891 IDNA2008 Protocol August 2010 173 + 174 + 175 + 1. Introduction 176 + 177 + This document supplies the protocol definition for Internationalized 178 + Domain Names in Applications (IDNA), with the version specified here 179 + known as IDNA2008. Essential definitions and terminology for 180 + understanding this document and a road map of the collection of 181 + documents that make up IDNA2008 appear in a separate Definitions 182 + document [RFC5890]. Appendix A discusses the relationship between 183 + this specification and the earlier version of IDNA (referred to here 184 + as "IDNA2003"). The rationale for these changes, along with 185 + considerable explanatory material and advice to zone administrators 186 + who support IDNs, is provided in another document, known informally 187 + in this series as the "Rationale document" [RFC5894]. 188 + 189 + IDNA works by allowing applications to use certain ASCII [ASCII] 190 + string labels (beginning with a special prefix) to represent 191 + non-ASCII name labels. Lower-layer protocols need not be aware of 192 + this; therefore, IDNA does not change any infrastructure. In 193 + particular, IDNA does not depend on any changes to DNS servers, 194 + resolvers, or DNS protocol elements, because the ASCII name service 195 + provided by the existing DNS can be used for IDNA. 196 + 197 + IDNA applies only to a specific subset of DNS labels. The base DNS 198 + standards [RFC1034] [RFC1035] and their various updates specify how 199 + to combine labels into fully-qualified domain names and parse labels 200 + out of those names. 201 + 202 + This document describes two separate protocols, one for IDN 203 + registration (Section 4) and one for IDN lookup (Section 5). These 204 + two protocols share some terminology, reference data, and operations. 205 + 206 + 2. Terminology 207 + 208 + As mentioned above, terminology used as part of the definition of 209 + IDNA appears in the Definitions document [RFC5890]. It is worth 210 + noting that some of this terminology overlaps with, and is consistent 211 + with, that used in Unicode or other character set standards and the 212 + DNS. Readers of this document are assumed to be familiar with the 213 + associated Definitions document and with the DNS-specific terminology 214 + in RFC 1034 [RFC1034]. 215 + 216 + The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 217 + "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 218 + document are to be interpreted as described in BCP 14, RFC 2119 219 + [RFC2119]. 220 + 221 + 222 + 223 + 224 + 225 + 226 + Klensin Standards Track [Page 4] 227 + 228 + RFC 5891 IDNA2008 Protocol August 2010 229 + 230 + 231 + 3. Requirements and Applicability 232 + 233 + 3.1. Requirements 234 + 235 + IDNA makes the following requirements: 236 + 237 + 1. Whenever a domain name is put into a domain name slot that is not 238 + IDNA-aware (see Section 2.3.2.6 of the Definitions document 239 + [RFC5890]), it MUST contain only ASCII characters (i.e., its 240 + labels must be either A-labels or NR-LDH labels), unless the DNS 241 + application is not subject to historical recommendations for 242 + "hostname"-style names (see RFC 1034 [RFC1034] and 243 + Section 3.2.1). 244 + 245 + 2. Labels MUST be compared using equivalent forms: either both 246 + A-label forms or both U-label forms. Because A-labels and 247 + U-labels can be transformed into each other without loss of 248 + information, these comparisons are equivalent (however, in 249 + practice, comparison of U-labels requires first verifying that 250 + they actually are U-labels and not just Unicode strings). A pair 251 + of A-labels MUST be compared as case-insensitive ASCII (as with 252 + all comparisons of ASCII DNS labels). U-labels MUST be compared 253 + as-is, without case folding or other intermediate steps. While 254 + it is not necessary to validate labels in order to compare them, 255 + successful comparison does not imply validity. In many cases, 256 + not limited to comparison, validation may be important for other 257 + reasons and SHOULD be performed. 258 + 259 + 3. Labels being registered MUST conform to the requirements of 260 + Section 4. Labels being looked up and the lookup process MUST 261 + conform to the requirements of Section 5. 262 + 263 + 3.2. Applicability 264 + 265 + IDNA applies to all domain names in all domain name slots in 266 + protocols except where it is explicitly excluded. It does not apply 267 + to domain name slots that do not use the LDH syntax rules as 268 + described in the Definitions document [RFC5890]. 269 + 270 + Because it uses the DNS, IDNA applies to many protocols that were 271 + specified before it was designed. IDNs occupying domain name slots 272 + in those older protocols MUST be in A-label form until and unless 273 + those protocols and their implementations are explicitly upgraded to 274 + be aware of IDNs and to accept the U-label form. IDNs actually 275 + appearing in DNS queries or responses MUST be A-labels. 276 + 277 + 278 + 279 + 280 + 281 + 282 + Klensin Standards Track [Page 5] 283 + 284 + RFC 5891 IDNA2008 Protocol August 2010 285 + 286 + 287 + IDNA-aware protocols and implementations MAY accept U-labels, 288 + A-labels, or both as those particular protocols specify. IDNA is not 289 + defined for extended label types (see RFC 2671 [RFC2671], Section 3). 290 + 291 + 3.2.1. DNS Resource Records 292 + 293 + IDNA applies only to domain names in the NAME and RDATA fields of DNS 294 + resource records whose CLASS is IN. See the DNS specification 295 + [RFC1035] for precise definitions of these terms. 296 + 297 + The application of IDNA to DNS resource records depends entirely on 298 + the CLASS of the record, and not on the TYPE except as noted below. 299 + This will remain true, even as new TYPEs are defined, unless a new 300 + TYPE defines TYPE-specific rules. Special naming conventions for SRV 301 + records (and "underscore labels" more generally) are incompatible 302 + with IDNA coding as discussed in the Definitions document [RFC5890], 303 + especially Section 2.3.2.3. Of course, underscore labels may be part 304 + of a domain that uses IDN labels at higher levels in the tree. 305 + 306 + 3.2.2. Non-Domain-Name Data Types Stored in the DNS 307 + 308 + Although IDNA enables the representation of non-ASCII characters in 309 + domain names, that does not imply that IDNA enables the 310 + representation of non-ASCII characters in other data types that are 311 + stored in domain names, specifically in the RDATA field for types 312 + that have structured RDATA format. For example, an email address 313 + local part is stored in a domain name in the RNAME field as part of 314 + the RDATA of an SOA record (e.g., hostmaster@example.com would be 315 + represented as hostmaster.example.com). IDNA does not update the 316 + existing email standards, which allow only ASCII characters in local 317 + parts. Even though work is in progress to define 318 + internationalization for email addresses [RFC4952], changes to the 319 + email address part of the SOA RDATA would require action in, or 320 + updates to, other standards, specifically those that specify the 321 + format of the SOA RR. 322 + 323 + 4. Registration Protocol 324 + 325 + This section defines the model for registering an IDN. The model is 326 + implementation independent; any sequence of steps that produces 327 + exactly the same result for all labels is considered a valid 328 + implementation. 329 + 330 + Note that, while the registration (this section) and lookup protocols 331 + (Section 5) are very similar in most respects, they are not 332 + identical, and implementers should carefully follow the steps 333 + described in this specification. 334 + 335 + 336 + 337 + 338 + Klensin Standards Track [Page 6] 339 + 340 + RFC 5891 IDNA2008 Protocol August 2010 341 + 342 + 343 + 4.1. Input to IDNA Registration 344 + 345 + Registration processes, especially processing by entities (often 346 + called "registrars") who deal with registrants before the request 347 + actually reaches the zone manager ("registry") are outside the scope 348 + of this definition and may differ significantly depending on local 349 + needs. By the time a string enters the IDNA registration process as 350 + described in this specification, it MUST be in Unicode and in 351 + Normalization Form C (NFC [Unicode-UAX15]). Entities responsible for 352 + zone files ("registries") MUST accept only the exact string for which 353 + registration is requested, free of any mappings or local adjustments. 354 + They MAY accept that input in any of three forms: 355 + 356 + 1. As a pair of A-label and U-label. 357 + 358 + 2. As an A-label only. 359 + 360 + 3. As a U-label only. 361 + 362 + The first two of these forms are RECOMMENDED because the use of 363 + A-labels avoids any possibility of ambiguity. The first is normally 364 + preferred over the second because it permits further verification of 365 + user intent (see Section 4.2.1). 366 + 367 + 4.2. Permitted Character and Label Validation 368 + 369 + 4.2.1. Input Format 370 + 371 + If both the U-label and A-label forms are available, the registry 372 + MUST ensure that the A-label form is in lowercase, perform a 373 + conversion to a U-label, perform the steps and tests described below 374 + on that U-label, and then verify that the A-label produced by the 375 + step in Section 4.4 matches the one provided as input. In addition, 376 + the U-label that was provided as input and the one obtained by 377 + conversion of the A-label MUST match exactly. If, for some reason, 378 + these tests fail, the registration MUST be rejected. 379 + 380 + If only an A-label was provided and the conversion to a U-label is 381 + not performed, the registry MUST still verify that the A-label is 382 + superficially valid, i.e., that it does not violate any of the rules 383 + of Punycode encoding [RFC3492] such as the prohibition on trailing 384 + hyphen-minus, the requirement that all characters be ASCII, and so 385 + on. Strings that appear to be A-labels (e.g., they start with 386 + "xn--") and strings that are supplied to the registry in a context 387 + reserved for A-labels (such as a field in a form to be filled out), 388 + but that are not valid A-labels as described in this paragraph, MUST 389 + NOT be placed in DNS zones that support IDNA. 390 + 391 + 392 + 393 + 394 + Klensin Standards Track [Page 7] 395 + 396 + RFC 5891 IDNA2008 Protocol August 2010 397 + 398 + 399 + If only an A-label is provided, the conversion to a U-label is not 400 + performed, but the superficial tests described in the previous 401 + paragraph are performed, registration procedures MAY, and usually 402 + will, bypass the tests and actions in the balance of Section 4.2 and 403 + in Sections 4.3 and 4.4. 404 + 405 + 4.2.2. Rejection of Characters That Are Not Permitted 406 + 407 + The candidate Unicode string MUST NOT contain characters that appear 408 + in the "DISALLOWED" and "UNASSIGNED" lists specified in the Tables 409 + document [RFC5892]. 410 + 411 + 4.2.3. Label Validation 412 + 413 + The proposed label (in the form of a Unicode string, i.e., a string 414 + that at least superficially appears to be a U-label) is then examined 415 + using tests that require examination of more than one character. 416 + Character order is considered to be the on-the-wire order. That 417 + order may not be the same as the display order. 418 + 419 + 4.2.3.1. Hyphen Restrictions 420 + 421 + The Unicode string MUST NOT contain "--" (two consecutive hyphens) in 422 + the third and fourth character positions and MUST NOT start or end 423 + with a "-" (hyphen). 424 + 425 + 4.2.3.2. Leading Combining Marks 426 + 427 + The Unicode string MUST NOT begin with a combining mark or combining 428 + character (see The Unicode Standard, Section 2.11 [Unicode] for an 429 + exact definition). 430 + 431 + 4.2.3.3. Contextual Rules 432 + 433 + The Unicode string MUST NOT contain any characters whose validity is 434 + context-dependent, unless the validity is positively confirmed by a 435 + contextual rule. To check this, each code point identified as 436 + CONTEXTJ or CONTEXTO in the Tables document [RFC5892] MUST have a 437 + non-null rule. If such a code point is missing a rule, the label is 438 + invalid. If the rule exists but the result of applying the rule is 439 + negative or inconclusive, the proposed label is invalid. 440 + 441 + 4.2.3.4. Labels Containing Characters Written Right to Left 442 + 443 + If the proposed label contains any characters from scripts that are 444 + written from right to left, it MUST meet the Bidi criteria [RFC5893]. 445 + 446 + 447 + 448 + 449 + 450 + Klensin Standards Track [Page 8] 451 + 452 + RFC 5891 IDNA2008 Protocol August 2010 453 + 454 + 455 + 4.2.4. Registration Validation Requirements 456 + 457 + Strings that contain at least one non-ASCII character, have been 458 + produced by the steps above, whose contents pass all of the tests in 459 + Section 4.2.3, and are 63 or fewer characters long in 460 + ASCII-compatible encoding (ACE) form (see Section 4.4), are U-labels. 461 + 462 + To summarize, tests are made in Section 4.2 for invalid characters, 463 + invalid combinations of characters, for labels that are invalid even 464 + if the characters they contain are valid individually, and for labels 465 + that do not conform to the restrictions for strings containing 466 + right-to-left characters. 467 + 468 + 4.3. Registry Restrictions 469 + 470 + In addition to the rules and tests above, there are many reasons why 471 + a registry could reject a label. Registries at all levels of the 472 + DNS, not just the top level, are expected to establish policies about 473 + label registrations. Policies are likely to be informed by the local 474 + languages and the scripts that are used to write them and may depend 475 + on many factors including what characters are in the label (for 476 + example, a label may be rejected based on other labels already 477 + registered). See the Rationale document [RFC5894], Section 3.2, for 478 + further discussion and recommendations about registry policies. 479 + 480 + The string produced by the steps in Section 4.2 is checked and 481 + processed as appropriate to local registry restrictions. Application 482 + of those registry restrictions may result in the rejection of some 483 + labels or the application of special restrictions to others. 484 + 485 + 4.4. Punycode Conversion 486 + 487 + The resulting U-label is converted to an A-label (defined in Section 488 + 2.3.2.1 of the Definitions document [RFC5890]). The A-label is the 489 + encoding of the U-label according to the Punycode algorithm [RFC3492] 490 + with the ACE prefix "xn--" added at the beginning of the string. The 491 + resulting string must, of course, conform to the length limits 492 + imposed by the DNS. This document does not update or alter the 493 + Punycode algorithm specified in RFC 3492 in any way. RFC 3492 does 494 + make a non-normative reference to the information about the value and 495 + construction of the ACE prefix that appears in RFC 3490 or Nameprep 496 + [RFC3491]. For consistency and reader convenience, IDNA2008 497 + effectively updates that reference to point to this document. That 498 + change does not alter the prefix itself. The prefix, "xn--", is the 499 + same in both sets of documents. 500 + 501 + 502 + 503 + 504 + 505 + 506 + Klensin Standards Track [Page 9] 507 + 508 + RFC 5891 IDNA2008 Protocol August 2010 509 + 510 + 511 + With the exception of the maximum string length test on Punycode 512 + output, the failure conditions identified in the Punycode encoding 513 + procedure cannot occur if the input is a U-label as determined by the 514 + steps in Sections 4.1 through 4.3 above. 515 + 516 + 4.5. Insertion in the Zone 517 + 518 + The label is registered in the DNS by inserting the A-label into a 519 + zone. 520 + 521 + 5. Domain Name Lookup Protocol 522 + 523 + Lookup is different from registration and different tests are applied 524 + on the client. Although some validity checks are necessary to avoid 525 + serious problems with the protocol, the lookup-side tests are more 526 + permissive and rely on the assumption that names that are present in 527 + the DNS are valid. That assumption is, however, a weak one because 528 + the presence of wildcards in the DNS might cause a string that is not 529 + actually registered in the DNS to be successfully looked up. 530 + 531 + 5.1. Label String Input 532 + 533 + The user supplies a string in the local character set, for example, 534 + by typing it, clicking on it, or copying and pasting it from a 535 + resource identifier, e.g., a Uniform Resource Identifier (URI) 536 + [RFC3986] or an Internationalized Resource Identifier (IRI) 537 + [RFC3987], from which the domain name is extracted. Alternately, 538 + some process not directly involving the user may read the string from 539 + a file or obtain it in some other way. Processing in this step and 540 + the one specified in Section 5.2 are local matters, to be 541 + accomplished prior to actual invocation of IDNA. 542 + 543 + 5.2. Conversion to Unicode 544 + 545 + The string is converted from the local character set into Unicode, if 546 + it is not already in Unicode. Depending on local needs, this 547 + conversion may involve mapping some characters into other characters 548 + as well as coding conversions. Those issues are discussed in the 549 + mapping-related sections (Sections 4.2, 4.4, 6, and 7.3) of the 550 + Rationale document [RFC5894] and in the separate Mapping document 551 + [IDNA2008-Mapping]. The result MUST be a Unicode string in NFC form. 552 + 553 + 5.3. A-label Input 554 + 555 + If the input to this procedure appears to be an A-label (i.e., it 556 + starts in "xn--", interpreted case-insensitively), the lookup 557 + application MAY attempt to convert it to a U-label, first ensuring 558 + that the A-label is entirely in lowercase (converting it to lowercase 559 + 560 + 561 + 562 + Klensin Standards Track [Page 10] 563 + 564 + RFC 5891 IDNA2008 Protocol August 2010 565 + 566 + 567 + if necessary), and apply the tests of Section 5.4 and the conversion 568 + of Section 5.5 to that form. If the label is converted to Unicode 569 + (i.e., to U-label form) using the Punycode decoding algorithm, then 570 + the processing specified in those two sections MUST be performed, and 571 + the label MUST be rejected if the resulting label is not identical to 572 + the original. See Section 8.1 of the Rationale document [RFC5894] 573 + for additional discussion on this topic. 574 + 575 + Conversion from the A-label and testing that the result is a U-label 576 + SHOULD be performed if the domain name will later be presented to the 577 + user in native character form (this requires that the lookup 578 + application be IDNA-aware). If those steps are not performed, the 579 + lookup process SHOULD at least test to determine that the string is 580 + actually an A-label, examining it for the invalid formats specified 581 + in the Punycode decoding specification. Applications that are not 582 + IDNA-aware will obviously omit that testing; others MAY treat the 583 + string as opaque to avoid the additional processing at the expense of 584 + providing less protection and information to users. 585 + 586 + 5.4. Validation and Character List Testing 587 + 588 + As with the registration procedure described in Section 4, the 589 + Unicode string is checked to verify that all characters that appear 590 + in it are valid as input to IDNA lookup processing. As discussed 591 + above and in the Rationale document [RFC5894], the lookup check is 592 + more liberal than the registration one. Labels that have not been 593 + fully evaluated for conformance to the applicable rules are referred 594 + to as "putative" labels as discussed in Section 2.3.2.1 of the 595 + Definitions document [RFC5890]. Putative U-labels with any of the 596 + following characteristics MUST be rejected prior to DNS lookup: 597 + 598 + o Labels that are not in NFC [Unicode-UAX15]. 599 + 600 + o Labels containing "--" (two consecutive hyphens) in the third and 601 + fourth character positions. 602 + 603 + o Labels whose first character is a combining mark (see The Unicode 604 + Standard, Section 2.11 [Unicode]). 605 + 606 + o Labels containing prohibited code points, i.e., those that are 607 + assigned to the "DISALLOWED" category of the Tables document 608 + [RFC5892]. 609 + 610 + o Labels containing code points that are identified in the Tables 611 + document as "CONTEXTJ", i.e., requiring exceptional contextual 612 + rule processing on lookup, but that do not conform to those rules. 613 + Note that this implies that a rule must be defined, not null: a 614 + 615 + 616 + 617 + 618 + Klensin Standards Track [Page 11] 619 + 620 + RFC 5891 IDNA2008 Protocol August 2010 621 + 622 + 623 + character that requires a contextual rule but for which the rule 624 + is null is treated in this step as having failed to conform to the 625 + rule. 626 + 627 + o Labels containing code points that are identified in the Tables 628 + document as "CONTEXTO", but for which no such rule appears in the 629 + table of rules. Applications resolving DNS names or carrying out 630 + equivalent operations are not required to test contextual rules 631 + for "CONTEXTO" characters, only to verify that a rule is defined 632 + (although they MAY make such tests to provide better protection or 633 + give better information to the user). 634 + 635 + o Labels containing code points that are unassigned in the version 636 + of Unicode being used by the application, i.e., in the UNASSIGNED 637 + category of the Tables document. 638 + 639 + This requirement means that the application must use a list of 640 + unassigned characters that is matched to the version of Unicode 641 + that is being used for the other requirements in this section. It 642 + is not required that the application know which version of Unicode 643 + is being used; that information might be part of the operating 644 + environment in which the application is running. 645 + 646 + In addition, the application SHOULD apply the following test. 647 + 648 + o Verification that the string is compliant with the requirements 649 + for right-to-left characters specified in the Bidi document 650 + [RFC5893]. 651 + 652 + This test may be omitted in special circumstances, such as when the 653 + lookup application knows that the conditions are enforced elsewhere, 654 + because an attempt to look up and resolve such strings will almost 655 + certainly lead to a DNS lookup failure except when wildcards are 656 + present in the zone. However, applying the test is likely to give 657 + much better information about the reason for a lookup failure -- 658 + information that may be usefully passed to the user when that is 659 + feasible -- than DNS resolution failure information alone. 660 + 661 + For all other strings, the lookup application MUST rely on the 662 + presence or absence of labels in the DNS to determine the validity of 663 + those labels and the validity of the characters they contain. If 664 + they are registered, they are presumed to be valid; if they are not, 665 + their possible validity is not relevant. While a lookup application 666 + may reasonably issue warnings about strings it believes may be 667 + problematic, applications that decline to process a string that 668 + conforms to the rules above (i.e., does not look it up in the DNS) 669 + are not in conformance with this protocol. 670 + 671 + 672 + 673 + 674 + Klensin Standards Track [Page 12] 675 + 676 + RFC 5891 IDNA2008 Protocol August 2010 677 + 678 + 679 + 5.5. Punycode Conversion 680 + 681 + The string that has now been validated for lookup is converted to ACE 682 + form by applying the Punycode algorithm to the string and then adding 683 + the ACE prefix ("xn--"). 684 + 685 + 5.6. DNS Name Resolution 686 + 687 + The A-label resulting from the conversion in Section 5.5 or supplied 688 + directly (see Section 5.3) is combined with other labels as needed to 689 + form a fully-qualified domain name that is then looked up in the DNS, 690 + using normal DNS resolver procedures. The lookup can obviously 691 + either succeed (returning information) or fail. 692 + 693 + 6. Security Considerations 694 + 695 + Security Considerations for this version of IDNA are described in the 696 + Definitions document [RFC5890], except for the special issues 697 + associated with right-to-left scripts and characters. The latter are 698 + discussed in the Bidi document [RFC5893]. 699 + 700 + In order to avoid intentional or accidental attacks from labels that 701 + might be confused with others, special problems in rendering, and so 702 + on, the IDNA model requires that registries exercise care and 703 + thoughtfulness about what labels they choose to permit. That issue 704 + is discussed in Section 4.3 of this document which, in turn, points 705 + to a somewhat more extensive discussion in the Rationale document 706 + [RFC5894]. 707 + 708 + 7. IANA Considerations 709 + 710 + IANA actions for this version of IDNA are specified in the Tables 711 + document [RFC5892] and discussed informally in the Rationale document 712 + [RFC5894]. The components of IDNA described in this document do not 713 + require any IANA actions. 714 + 715 + 8. Contributors 716 + 717 + While the listed editor held the pen, the original versions of this 718 + document represent the joint work and conclusions of an ad hoc design 719 + team consisting of the editor and, in alphabetic order, Harald 720 + Alvestrand, Tina Dam, Patrik Faltstrom, and Cary Karp. This document 721 + draws significantly on the original version of IDNA [RFC3490] both 722 + conceptually and for specific text. This second-generation version 723 + would not have been possible without the work that went into that 724 + first version and especially the contributions of its authors Patrik 725 + Faltstrom, Paul Hoffman, and Adam Costello. While Faltstrom was 726 + 727 + 728 + 729 + 730 + Klensin Standards Track [Page 13] 731 + 732 + RFC 5891 IDNA2008 Protocol August 2010 733 + 734 + 735 + actively involved in the creation of this version, Hoffman and 736 + Costello were not and should not be held responsible for any errors 737 + or omissions. 738 + 739 + 9. Acknowledgments 740 + 741 + This revision to IDNA would have been impossible without the 742 + accumulated experience since RFC 3490 was published and resulting 743 + comments and complaints of many people in the IETF, ICANN, and other 744 + communities (too many people to list here). Nor would it have been 745 + possible without RFC 3490 itself and the efforts of the Working Group 746 + that defined it. Those people whose contributions are acknowledged 747 + in RFC 3490, RFC 4690 [RFC4690], and the Rationale document [RFC5894] 748 + were particularly important. 749 + 750 + Specific textual changes were incorporated into this document after 751 + suggestions from the other contributors, Stephane Bortzmeyer, Vint 752 + Cerf, Lisa Dusseault, Paul Hoffman, Kent Karlsson, James Mitchell, 753 + Erik van der Poel, Marcos Sanz, Andrew Sullivan, Wil Tan, Ken 754 + Whistler, Chris Wright, and other WG participants and reviewers 755 + including Martin Duerst, James Mitchell, Subramanian Moonesamy, Peter 756 + Saint-Andre, Margaret Wasserman, and Dan Winship who caught specific 757 + errors and recommended corrections. Special thanks are due to Paul 758 + Hoffman for permission to extract material to form the basis for 759 + Appendix A from a draft document that he prepared. 760 + 761 + 10. References 762 + 763 + 10.1. Normative References 764 + 765 + [RFC1034] Mockapetris, P., "Domain names - concepts and 766 + facilities", STD 13, RFC 1034, November 1987. 767 + 768 + [RFC1035] Mockapetris, P., "Domain names - implementation and 769 + specification", STD 13, RFC 1035, November 1987. 770 + 771 + [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 772 + Requirement Levels", BCP 14, RFC 2119, March 1997. 773 + 774 + [RFC3492] Costello, A., "Punycode: A Bootstring encoding of 775 + Unicode for Internationalized Domain Names in 776 + Applications (IDNA)", RFC 3492, March 2003. 777 + 778 + [RFC5890] Klensin, J., "Internationalized Domain Names for 779 + Applications (IDNA): Definitions and Document 780 + Framework", RFC 5890, August 2010. 781 + 782 + 783 + 784 + 785 + 786 + Klensin Standards Track [Page 14] 787 + 788 + RFC 5891 IDNA2008 Protocol August 2010 789 + 790 + 791 + [RFC5892] Faltstrom, P., Ed., "The Unicode Code Points and 792 + Internationalized Domain Names for Applications (IDNA)", 793 + RFC 5892, August 2010. 794 + 795 + [RFC5893] Alvestrand, H., Ed. and C. Karp, "Right-to-Left Scripts 796 + for Internationalized Domain Names for Applications 797 + (IDNA)", RFC 5893, August 2010. 798 + 799 + [Unicode-UAX15] 800 + The Unicode Consortium, "Unicode Standard Annex #15: 801 + Unicode Normalization Forms", September 2009, 802 + <http://www.unicode.org/reports/tr15/>. 803 + 804 + 10.2. Informative References 805 + 806 + [ASCII] American National Standards Institute (formerly United 807 + States of America Standards Institute), "USA Code for 808 + Information Interchange", ANSI X3.4-1968, 1968. ANSI 809 + X3.4-1968 has been replaced by newer versions with 810 + slight modifications, but the 1968 version remains 811 + definitive for the Internet. 812 + 813 + [IDNA2008-Mapping] 814 + Resnick, P. and P. Hoffman, "Mapping Characters in 815 + Internationalized Domain Names for Applications (IDNA)", 816 + Work in Progress, April 2010. 817 + 818 + [RFC2671] Vixie, P., "Extension Mechanisms for DNS (EDNS0)", 819 + RFC 2671, August 1999. 820 + 821 + [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, 822 + "Internationalizing Domain Names in Applications 823 + (IDNA)", RFC 3490, March 2003. 824 + 825 + [RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep 826 + Profile for Internationalized Domain Names (IDN)", 827 + RFC 3491, March 2003. 828 + 829 + [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 830 + Resource Identifier (URI): Generic Syntax", STD 66, 831 + RFC 3986, January 2005. 832 + 833 + [RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource 834 + Identifiers (IRIs)", RFC 3987, January 2005. 835 + 836 + [RFC4690] Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review 837 + and Recommendations for Internationalized Domain Names 838 + (IDNs)", RFC 4690, September 2006. 839 + 840 + 841 + 842 + Klensin Standards Track [Page 15] 843 + 844 + RFC 5891 IDNA2008 Protocol August 2010 845 + 846 + 847 + [RFC4952] Klensin, J. and Y. Ko, "Overview and Framework for 848 + Internationalized Email", RFC 4952, July 2007. 849 + 850 + [RFC5894] Klensin, J., "Internationalized Domain Names for 851 + Applications (IDNA): Background, Explanation, and 852 + Rationale", RFC 5894, August 2010. 853 + 854 + [Unicode] The Unicode Consortium, "The Unicode Standard, Version 855 + 5.0", 2007. Boston, MA, USA: Addison-Wesley. ISBN 856 + 0-321-48091-0. This printed reference has now been 857 + updated online to reflect additional code points. For 858 + code points, the reference at the time this document was 859 + published is to Unicode 5.2. 860 + 861 + 862 + 863 + 864 + 865 + 866 + 867 + 868 + 869 + 870 + 871 + 872 + 873 + 874 + 875 + 876 + 877 + 878 + 879 + 880 + 881 + 882 + 883 + 884 + 885 + 886 + 887 + 888 + 889 + 890 + 891 + 892 + 893 + 894 + 895 + 896 + 897 + 898 + Klensin Standards Track [Page 16] 899 + 900 + RFC 5891 IDNA2008 Protocol August 2010 901 + 902 + 903 + Appendix A. Summary of Major Changes from IDNA2003 904 + 905 + 1. Update base character set from Unicode 3.2 to Unicode version 906 + agnostic. 907 + 908 + 2. Separate the definitions for the "registration" and "lookup" 909 + activities. 910 + 911 + 3. Disallow symbol and punctuation characters except where special 912 + exceptions are necessary. 913 + 914 + 4. Remove the mapping and normalization steps from the protocol and 915 + have them, instead, done by the applications themselves, 916 + possibly in a local fashion, before invoking the protocol. 917 + 918 + 5. Change the way that the protocol specifies which characters are 919 + allowed in labels from "humans decide what the table of code 920 + points contains" to "decision about code points are based on 921 + Unicode properties plus a small exclusion list created by 922 + humans". 923 + 924 + 6. Introduce the new concept of characters that can be used only in 925 + specific contexts. 926 + 927 + 7. Allow typical words and names in languages such as Dhivehi and 928 + Yiddish to be expressed. 929 + 930 + 8. Make bidirectional domain names (delimited strings of labels, 931 + not just labels standing on their own) display in a less 932 + surprising fashion, whether they appear in obvious domain name 933 + contexts or as part of running text in paragraphs. 934 + 935 + 9. Remove the dot separator from the mandatory part of the 936 + protocol. 937 + 938 + 10. Make some currently valid labels that are not actually IDNA 939 + labels invalid. 940 + 941 + Author's Address 942 + 943 + John C Klensin 944 + 1770 Massachusetts Ave, Ste 322 945 + Cambridge, MA 02140 946 + USA 947 + 948 + Phone: +1 617 245 1457 949 + EMail: john+ietf@jck.com 950 + 951 + 952 + 953 + 954 + Klensin Standards Track [Page 17] 955 +