Domain Name System
In the Internet, the Domain Name System (DNS) is a critically important directory service that translates to and from a raw IP address (such as 207.46.197.32) and a domain name (such as microsoft.com). This allows people to interact with software via easier-to-remember domain names instead of numerical IP addresses. More importantly, it allows information to move around on the internet, from host to host, whereas people can still expect to find the information via its domain name. For example, a user need not care if Microsoft Corporation changes the IP address on one of its host computers; all the user needs to know is the domain name microsoft.com, and he of she can then (thanks to DNS) find Microsoft's computers regardless of which IP address the internet has assigned to its computers this year.
DNS is a hierarchical database distributed widely across many host computers on the internet. It is also a set of application protocols for interacting with the database. The original purpose of DNS was to translate a domain name to an IP address (forward DNS), and an IP address to a domain name (reverse DNS)[1], but in recent years there have been ongoing attempts to expand the purpose and functionality of DNS in the internet. Further, because the lookup process for DNS superficially appears to resemble the lookup process for searching on the world wide web, it has become easy to confuse the purposes of a DNS lookup with a search-engine lookup. These two kinds of lookups have very different goals and occur at vastly different levels within the internet protocol stack. This article will explain the functions and purposes of the internet Domain Name System, the nature of its distributed and hierarchical database, and the protocols for accessing it. It will also note how the functions of DNS differ markedly from those of search engines, since this seems to be a matter of frequent confusion on the part of learners. In lay terms, you might think of DNS as like the white pages in a traditional phone book, and search engines as more like the yellow pages.
As the white page type lookup service of the internet, DNS has been attacked by hostile programs either attempting to disrupt internet traffic or divert it to illicit host machines. The distributed and simplistic approach taken by DNS has proved, historically, surprisingly resilient against such attacks, but as the size and importance of the internet has grown, so have the security concerns related to DNS. This article, or its related sub-articles, will also tackle security issues surrounding DNS.
DNS illustrates a core concept that has emerged in Computer science called late binding. Using this approach, a user knows only the name of something but not its location; the user need not be aware if information moves from one computer to another, because the database will be updated along with the move so that people can always find the address when it's needed. DNS, introduced into the internet in 1983, is an early and successful implementation of this concept. Since the 1990's, late binding has been often used in compilers for programming languages and linkers which load programs in a computers memory before and during their execution. Library science has embodied the concept in its Digital Object Identifier (DOI) standard for looking up publications online.
History
DNS was first introduced for use on the internet in 1983, with the first specification written by Paul Mockapetris.[2] Mockapetris' first DNS implementation was called JEEVES, and replaced the ARPANET (pre-Internet) environment with few enough computers that a single file, hosts.txt
, was sufficient to contain all connected computer names and their numeric addresses.[3] Its designers, however, did not think of it as anything like a search engine, with the ability to seek a name corresponding to an idea (e.g. "pizza"), but to work with explicit names already known by the application. Sharing hosts files manually quickly became impossible to scale as the Internet grew, and DNS was designed and implemented as the solution to the problem of scalable host name resolution.
Note well: all DNS was designed to do was replace the hosts.txt
file that had the name to address mappings for every computer in the ARPANET. That's all. DNS was not designed to be a search engine. Search engines hadn't been invented, since, after all, the Web had not been invented.
Protocol designers | Name & address authorities | System administrators |
---|---|---|
Standard formats for resource data. | Addresses for the root servers | The definition of zone boundaries |
Standard methods for querying the database | Unique assignments of domain names | Master files of data (i.e., sets of Resource Records (RR) |
Standard methods for name servers to refresh local data from foreign name servers. | Operation, perhaps with delegation of the root servers and top-level domain servers | Statements of the refresh policies desired |
New requirements
Over the years, it has taken on more technical and administrative roles. These include providing additional information for the names and addresses, especially for security; the DNS infrastructure itself needed to be enhanced to be secure and trusted. [4] DNS originally was manually configured, but there have needed to be a variety of extensions to allow dynamic operation, such as the temporary binding of an address to a name.
The domain name space, as well as the address spaces both for Internet Protocol version 4 and Internet Protocol version 6 (IPv6) are under the authority of the Internet Corporation for Assigned Names and Numbers (ICANN), with much delegation of administration. The original system only handled IPv4, so one of the first steps for IPv6 support was defining how to represent IPv6 addresses in DNS. [5] Berkeley Internet Name Domain (BIND), first deployed in BSD 4.3 UNIX and written by Kevin Dunlap, was the first widespread DNS implementation. BIND is now public domain code supported by the Internet Systems Consortium [6].
In the years DNS has served, Internet technology and operational issues change. When the new IPv6 adddress format came into use, the need to change name-to-address mapping tools to handle that format is understandable.
Less obvious, but still necessary, is the new requirement to have a capability to track dynamically assigned addresses when there is no central address server. Domain Name System dynamic update can can do such tracking, but dynamic update at this level is a security vulnerability. Address assignment spoofing is, by no means, the only threat to DNS, and an entire set of Domain Name System security (DNSSEC) are being deployed.[4]
The U.S. government is requiring DNSSEC for all Federal information systems by December 2009.[7]
Domain name structure and schema
The DNS namespace is hierarchical. Individual domain and host names within it have a textual representation, from right to left, which mirrors the tree that makes up the schema of the DNS:
en.citizendium.com
appears to have three components, but actually has four. The naming hierarchy is a tree, with increasingly specific levels reading right to left.
From what can be seen in the textual example,
- .com is a top-level domain (TLD) under the authority of a TLD registry.
- .citizendium is a second-level domain under the authority of a SLD registry (SLD)
- .en identifies either a subdomain or a host, as defined by the
citizendium.com
technical administrator.
What cannot be seen is the hierarchically "zeroth" highest part, the root. If a part usually suppressed were displayed,
en.citizendium.com.
The rightmost dot identifies the root of the DNS tree. In actual practice, there are multiple root servers, for which addresses are in an explicit file, a representative of whih is found at http://www.internic.net/zones/named.root
It is defined as:
This file holds the information on root name servers needed to initialize cache of Internet domain name servers (e.g. reference this file in the "cache . <file>" configuration file of BIND domain name servers).
A fully qualified domain name can be traced from the hierarchically lowest host name to the root. For example, en.citizendium.org
goes from the host en
all the way up to the top-level domain .org
, which is connected to the root.
A computer within the second-level domain citizendium.org
could refer to the subdomainen
, which would be a relative domain name; most DNS applications would append the current domain to the right of the host name. k12.en.citizendium.org
is a hypothetical subdomain of en.citizendium.org
; an arbitrary host could be larry.en.citizendium.org
and the DNS software would understand if it is dealing with a host or a domain.
Domain name authority and issues
Name assignment
The administrative process of DNS name assignment involves both DNS registries and DNS registrars
DNS registries
DNS registries' fundamental role is to operate the data base for their top-level domain (TLD), and authorize registrars as "retail" agents to provide customer service. The bulk of TLDs are national, and use International Organization for Standardization (ISO) two-letter country codes (e.g., Canada=.ca, China - .cn, Germany=.de, United Kingdom = .uk). A few, such as Tuvalu's .tv, form attractive branding, and the country has few internal registrants but considerable income from outside registrants.
Country codes were not, at first, used, and the majority of registrations still go into the best-known .com. Some countries have a rational system where they use the "traditional" major suffix, or a variant of it, as a second-level domain, such as .co.uk, or .edu.uk. This has not always been done in an intuitive manner; would a relatively naive user expect .com.uk or .co.uk, or .org.uk vs. or.uk ? [8]
Top-level domain | Registry | Comments |
---|---|---|
.aero | Societe Internationale de Telecommunications Aeronautiques SC, (SITA) | Sponsored by air transport industry |
.com | Verisign | Unsponsored |
.edu | Educause | Under U.S. government agreement, ending in 2011 |
.net | Verisign | Unsponsored |
.mil | Defense Information Systems Agency | U.S. government agency |
.org | Public Interest Registry (PIR) | Unsponsored; not-for-profit |
.biz | NeuLevel, Inc. | Unsponsored |
There is a continuing business, political, and technical argument about the desirability of more TLDs, especially from those that want TLDs that are suggestive of the business purpose of a registrant. From a technical standpoint, while a proliferation of TLDs would not, as once suspected, seriously impact DNS performance, it would be likely to increase customer support cost due to the likelihood of making mistakes and getting the wrong domain.
Another argument, the details of which involve intellectual property issues beyond the scope of this article, is the legal theory that a trademark must be "defended" or risks going into the public domain. If a second-level domain is identical to a trademarked company name, does the company have exclusive rights to it? Intellectual property attorneys have often argued that a well-known-company is not "defending" its trademark if it allows a domain to be created with its name, so there has been a tendency that whenever some TLD ".new" is created, trademark holders rush to register "well-known-company.new". Speculators, meanwhile, rush to do so before the trademark holder can do so, and, if successful, sell the rights to the domain at a very high price.
One especially hotly argued and unresolved issue is whether sexually-oriented businesses should have a .xxx TLD; some of those arguing against it also want to restrict access to sexually-oriented content, which would be identified by the TLD. Obviously, there would be no way to enforce keeping sexually-oriented content in .xxx, but it could reasonably be assumed that if a domain were in .xxx, it was sexually-oriented.
DNS registrars
Registrars are the "retail" side of DNS operation. In .com and many other TLDs, they are profit-making entitities. They deal with organizations that wish to acquire particular domain names, verifying the name is available, and then handling the administrative interaction with the domain registry.
Most registrars are reasonable and ethical. They may be subdivisions of companies that can sell additional services, such as web server hosting, to domain registrants. Frequently, they have user support functions that will help new DNS administrators set up their zone files, or they may actually operate name servers on behalf of registrants. If there is a dispute over the rights to a domain name, one's registrar can be a valuable ally.
There are registrars that compete for the business of large hosting centers and other organizations that need many domain names, typically discounting the registration fee to multiple-domain customers. It is to the advantage of a registrar to keep its existing customers, as most domains will be renewed, producing a continuing income stream. Registrars want to avoid "churn", a name for customers changing to other registrars.
Some registrars, unfortunately, act against the original Internet tradition of it being a shared resource, and DNS being a service. Domain registrations expire annually, although one can pay the registrar to renew it automatically. It is not uncommon for certain registrars to look for domain names that expire in the near term, domains that were registered by a different registrar, and send the domain administrators what appear to be legitimate renewal notices. If completed and returned with payment, such a registrar will indeed renew the domain name — but transfer it away from the existing registrar.
Legal and business issues associated with domain names
When the ARPANET, and then the Internet, were new, DNS was seen as a simple mechanism to avoid memorizing or typing host addresses. As the Internet became more commercial, domain names acquired business value, since new users were apt to look for "company" at company.com
. Indeed, as unpleasant to the DNS-knowledgeable ear as it may be, there are a substantial number of enterprises that have "dot-com", or sometimes other TLDs, as part of their corporate name.
Name servers and zone files
A [sub]domain is a name space that need not have names in it. The basic source of name information that goes into a particular space is a zone file, created manually or with software assistance.
Just as the DNS namespace is a tree of domains, the actual information in that namespace can be regarded as a tree of zone files.
Name servers are computers that contain information about domains, all the way up to the root. Be sure to understand the difference between the abstraction of a domain or subdomain namespace, and the zone file that describes the contents of that namespace and actually runs in a name server. The primary name server is authoritative for domains, and contains the master copy of the zone file for that domain.
Name servers can contain more than one zone file; indeed, this is the usual case when there are domains with subdomains.
Depending on the implementation, a name server may cache information in addition to what it learned from the zone file. For example, a local cache file in a name server could contain data about name-address relationships outside the domain, but which have been needed by a client within that domain. The name server may also contain limited-lifetime dynamic name updates, which might or might not be accessible from outside the domain.
RFC1034, the basic DNS conceptual specification, describes two ways, one optional and one required, for looking up names.[9] The same logic is relevant inside a domain that has caching nameservers.
- Iterative: the server refers the client to another server and lets the client pursue the query; the client is aware of multiple nameservers but is only interacting with one at a time
- recursive: the first server pursues the query for the client at another server; the client is aware of only one DNS server
Domains versus zones
At each of these levels is an abstract namespace. No other second-level domain could have notcz.citizendium.com, but the administrator of citizendium.com is not obligated to have any number of subordinate hosts or domains. There is a subtle distinction between the abstraction of a name space, and a zone file that actually defines the hosts and subdomains in the zone.
Resource records
Zone files are made up of resource records (RR). All RRs have several common properties:
- owner: the domain in which the authoritative RR resides. This is often implicitly derived from context, perhaps relative to the current domain name
- type: an encoded 16 bit value that defines the type of resource defined by the current records. Some types are obsolete, while others continue to be added for new DNS functions.
- class: an obsolete but required field, it is a 16 bit value for the protocol family with which the RR is associated. The only value used is the Internet, textually represented as IN
- time to live: commonly called TTL, this parameter specifies how long the RR may be kept in a cache and assumed to be valid. It is a 32 bit integer, whose value is measured in seconds
- RDATA: type-specific data about the resource
While there are many graphic tools for creating RRs, the basic textual syntax is:
[owner] IN [class] [rdata]
For example, the RR defining the address associated with the name XX.LCS.MIT.EDU[10]
XX.LCS.MIT.EDU. IN A 10.0.0.44
Class | RR Name | Function | Typical RDATA |
---|---|---|---|
SOA | Start Of Authority | Defines the start of a zone or a subzone; subordinate records inherit parameters | Multiple fields |
A | Address IPv4 | Specifies the IPv4 address for a host | IPv4 Address |
AAAA | Address IPv6 | Specifies the IPv6 address for a host | IPv4 Address |
PTR | "Pointer" | Reverse mapping of address to name | Name |
CNAME | Canonical name | Specifies an alias name for an address | Address |
NS | Name server | (usually) an address of a name server one level of domain hierarchy above the current domain | Address |
MX | Mail exchanger | Defines the start of a zone or a subzone; subordinate records inherit parameters | a 16 bit preference value (lower is better) followed by a host name willing to act as a mail exchange for the owner domain. |
Wildcards in Resource Records
An additional complexity of RRs is that they may contain wildcards. The simplest example is a "*" character in a name expression will match any string. In specific situations, this is an extremely useful function, but it can complicate troubleshooting.[11]
In 2003, Verisign, who operates the .com registry, inserted a wildcard into the master DNS fils, so that an undefined name, rather than returning an error message, would be redirected to one of the registry's commercial search engines.[12] If the World Wide Web alone were the only function on the Internet, this might, although revenue-generating, have been useful. Unfortuntately, there are many other functions on the Internet. In particular, messaging application protocols such as the Simple Mail Transfer Protocol (SMTP) would use the "host not found" information to conclude that mail to that host was undeliverable.
A quite useful use for a wildcard, however, would be in a split DNS application, with different name resolution policies on different sides of a firewall. On the public Internet side of the firewall, the DNS server for example.com
would have explicit records for the organization's public web server, mail server, and other public servers. Any reference to "inside" addresses, however, would be handled by the record:
*.example.com IN A [outside address of the firewall]
Domain Name System security however, does not have a complete solution to working with wildcarded RRs.
Deploying DNS
To understand basic DNS, assume that it is being used in a single organization, which has one technical and administrative authority in control. In other words, the domain and its subdomains are homogeneous. While there may be minor exceptions due to the existence of temporarily cached data in individual clients and servers, and not all clients and servers may be able to view all parts of the highest-level domain, a single organization's DNS is essentially a distributed data base, where there are multiple copies of a single "golden copy" of information.
Once one starts interconnecting domains under different authority, as in the Internet, both administrative and technical aspects change. First, it is understood that while the total collection of all domains conceptually have access to all public name information, no one domain will have a copy of all information. Rather than being a distributed data base, it has become a federated data base, where there is a common indexing and retrieval model, but requests may need to go to multiple servers, in multiple domains and subdomains, before the request is satisfied.
Second, even between well-recognized business partner organizations, there are trust issues. Third, there are miscreants actively attacking the DNS, for reasons from ideology to technical status to pure criminal revenue.
Basic Implementation
The administrator of a homegeneous domain (and its subdomains) starts by building a zone file that defines the names and addresses of hosts in that zone, optional additional information to be added to the responses, and to a higher-level nameserver that helps connect the domain of the zone to other domains. For example, if one was in a.com
, one would have to go to the nameserver of .com
to find the address of the b.com
nameserver.
SOA RR
The zone/domain name starts the record; it must end with a trailing period. Assume that it is sub.example.com.</ref>
In the resource data, the first field is the primary name server that is in this domain, as opposed to the name server in the NS record, which is above and outside the current domain. In this case, it might be
ns1.sub.example.com.
Next comes the mail address of the person or role responsible for the data in this domain, written not in the conventional
user@domain
, but in the syntax of a DNS name in a zone file. To create a mail address, replace the leftmost period with an "@" symbol and remove the trailing period. administrator.sub.example.com.
Following the administrator are several parameters that may have defaults, but should be known. The first is the serial number of this version of the zone file, which will increase whenever this file is updated.
The next four are timers for the domain, specified in seconds:
- refresh interval: Secondary name servers in the domain should check the primary for new data after this number of seconds expires
- retryinterval: If the secondary was unable to get an update when the refresh interval expires, this parameter tells the secondary how long to wait before retrying. The value in this field is usually less than the refresh interval
- expireinterval: If the secondary was unable to get an update before this timer expires, it should assume that all of the RR information in its copy of the zone file. If this timer triggers, the secondary server will stop responding to DNS requests
- TTL: The default TTL for RRs in this zone. An appropriate TTL is controversial, and may be quite different on an internal nameserver versus one accessible from the Internet. The shorter the interval, the more accurate is the data, and, further, the better it is for name-based load distribution schemes. The longer the interval, the less DNS traffic is generated
NS RR
gives the IP address of a hierarchically higher name server to which the name server goes when it cannot complete a name-to-address or address-to-name mapping based on its own information.
A and AAAA RR
Code the authoritative host name and its address, and, optionally, the TTL if different from the zone TTL.
PTR RR
Code an address and the corresponding host name, and, optionally, the TTL if different from the zone TTL.
CNAME RR
Code an alternative host name and its address, and, optionally, the TTL if different from the zone TTL.
Resource Record sets (RRsets)
While no two RRs should have the same label and type and data all equal, it is perfectly possible to have RRs with the same label and type, but different RDATA. For example, a physically multihomed server could have four network interface cards (NIC), each on a different subnet. The set of addresses for this host name (i.e., label) would reasonably form a set of four A records with different address data. Such a set of records is called a Resource Record Set (RRSet). [13]
Obtaining root information
The root name server zone file is expected to be retrieved, by anonymous FTP, from various well-known sites approved by ICANN. In practice, most DNS implementations ship with a recent copy. Root servers remain very busy. [3] If fact, while the root server zone file mentioned above will give the names and addresses of root servers in the general form
a.root-servers.net
the address of a particular server is of the anycast type; [14] there are multiple physical computers with that address, for fault tolerance and load sharing.
For each domain, there must be at least one, and preferably more than one name server that holds the zone files. Primary domain servers have the authoritative zone files, and secondary domain servers keep an exact copy of the primary's zone file. Both types are assumed to have a disk or other storage from which they can restore the domain information.
A secondary server will use a zone transfer to obtain the primary zone file for its domain. There are various operational reasons why a physical server might act as primary and secondary for multiple zones; the important point here is that a zone transfer, as opposed to ordinary DNS retrieval, alters the contents of the definitions and must be treated as a sensitive operation.
The nameserver also can take dynamic transfers, which, strictly speaking, do not have to be secured, but dynamic update, especially in a IPv6 environment, is so open an invitation to miscreants that it should never be considered without being secured. DNS security is the normal way this might be done, but there are other alternatives, such as an encrypted link between the update source and the nameserver.
There are also caching-only servers that contain only the names and addresses that have been recently looked up, and are still valid with respect to the TTL parameter in the relevant records.
The program, on a host, which is the client of DNS servers is most often called a resolver. Depending on the local network architectural implementation, a resolver may go to a caching-only server, a secondary server, or the primary server for its information. It may retain a cache of recently retrieved DNS information, clearing items from cache as their TTLs expire.
Heterogeneous DNS
While there will be different federated databases, DNS is certainly not limited to the public Internet. It is quite common for organizations to have split DNS "inside the firewall" and "outside the firewall". An inside user will query local DNS for the address of an internal machine and get the address of the actual host, but, if it asks for the address of citizendium.com
, the address returned by DNS may well be that of the "inside" interface of a firewall, or other security middlebox[15] Depending on the firewall implementation, it may deny access, or create a proxy connection to the outside host. To establish that connection, the middlebox will query an "outside" DNS, which contains the addresses of the organizations' public hosts, but primarily contains the addresses of external hosts. In some cases, that outside DNS enjoys some trust with an external organization, and may do secured zone transfers. More often, however, the outside DNS is primarily a cache of name-address information that it obtained by queries to the nameservers of other domains.
DNS protocols
The most basic DNS protocols are the lookup service, which runs over port 53 of the connectionless User Datagram Protocol, and the zone transfer service, which also runs over port 53 of the connection-oriented Transmission Control Protocol.[16] Lookup is a read-only function, while zone update is read-write and should be implemented as a privileged, authenticated operation. Otherwise any client on a DNS server's network could request a zone transfer, and receive a complete copy of a zonefile, which is a security risk.
There are also protocols for dynamic update, so that network clients can automatically update their DNS servers to reflect correct hostnames (e.g. if they dynamically receive a different IP address via DHCP). This concept is also known as Dynamic DNS. [17]
Extended applications
These include Domain Name System dynamic update, use of the DNS as a data base in Public Key Infrastructure for security, Domain Name System security (DNSSEC) and name-based routing and load distribution.
References
- ↑ Mockapetris, P.V. (November 1987), Domain names - concepts and facilities, Internet Engineering Task Force, RFC1034
- ↑ Mockapetris, P.V. (November 1983), Domain names: Concepts and facilities, Internet Engineering Task Foce, RFC882
- ↑ Jump up to: 3.0 3.1 Albitz, Paul & Cricket Liu (1997), DNS and BIND, second edition, O'Reilly p. 9
- ↑ Jump up to: 4.0 4.1 Arends, R. et al. (March 2005), DNS Security Introduction and Requirements, Internet Engineering Task Force, RFC4033 Cite error: Invalid
<ref>
tag; name "RFC4033" defined multiple times with different content
- ↑ Bush, R. et al. (August 2002), Representing Internet Protocol version 6 (IPv6) Addresses in the Domain Name System (DNS), Internet Engineering Task Force, RFC3363
- ↑ http://www.isc.org/index.pl
- ↑ Evans, Karen (August 22, 2008), Securing the Federal Government’s Domain Name System Infrastructure (Submission of Draft Agency Plans Due by September 5, 2008)
- ↑ Dyer, Stephen (October 1, 2004), .UK – Revisited
- ↑ RFC1034, pp. 3-4
- ↑ Note that the actual RR has a terminal period that does not appear when the DNS name is written in other uses
- ↑ E. Lewis (July 2006), The Role of Wildcards in the Domain Name System, RFC4592
- ↑ Internet Corporation for Assigned Names and Numbers, Verisign's Wildcard Service Deployment
- ↑ R. Elz, R. Bush (July 1997), Clarifications to the DNS Specification, Internet Engineering Task Force, RFC 2181
- ↑ Liman, Lars-Johan et al, Operation of the Root Name Servers
- ↑ P. Srisuresh, J. Kuthan, J. Rosenberg, A. Molitor, A. Rayhan (August 2002), Middlebox communication architecture and framework., RFC3303
- ↑ Mockapetris., P.V. (November 1987), Domain names - implementation and specification, Internet Engineering Task Force, RFC1035
- ↑ Vixie, P., ed. (April 1997), Dynamic Updates in the Domain Name System (DNS UPDATE), Internet Engineering Task Force, RFC2136
- Pages with reference errors
- Pages using RFC magic links
- Editable Main Articles with Citable Versions
- CZ Live
- Computers Workgroup
- Internet operations Subgroup
- Distributed computing Subgroup
- Articles written in American English
- Advanced Articles written in American English
- All Content
- Computers Content
- Internet operations tag
- Distributed computing tag