You are on page 1of 7

Domain Name System (DNS)

Goal: Design a system to look up domain names that can scale to the planet-wide internet and handle
queries on billions of objects.

The Internet Domain Name System (DNS) is the naming system for nodes on the Internet. It
associates human-friendly names with numeric IP addresses and other information about
that node.

Introduction
The Internet Domain System, DNS, is the distributed system that enables the lookup of
hundreds of millions of domain names. It’s an application-specific implementation, not a
generic object store, but it is a collection of software that is used every time we access a web
page, send email, or send a packet to any system on the Internet.

How are IP addresses assigned?


Before we get to Internet domain names, let’s touch on IP addresses. The Internet employs a
hierarchical system for assigning IP addresses.

A global non-profit organization called ICANN, or the Internet Corporation for Assigned
Names and Numbers, is responsible for managing IP addresses, autonomous system
numbers that are used for routing, and the domain name system.

The Internet Assigned Numbers Authority, the IANA, is a department within ICANN that
is responsible for assigning IP addresses and managing top-level domains. The IANA
allocates chunks of the IP address space to five organizations called Regional Internet
Registries (RIR).
These cover large geographic areas.
For instance, ARIN is the American Regional Internet Registry and covers the U.S. and
Canada.

The full list of RIRs is can be found at nro.net and comprises:

1. AFRINIC: African Network Coordination Centre


2. APNIC: Asia-Pacific Network Coordination Centre
3. ARIN: American Registry for Internet Numbers (U.S., Canada, Caribbean and North Atlantic
islands)
4. LACNIC: Latin American and Caribbean Internet Addresses Registry
5. RIPE NCC: Réseaux IP Européens Network Coordination Centre (Europe, the Middle East,
and parts of Central Asia)
These Regional Internet Registries, in turn, assign ranges of IP addresses to ISPs and other
autonomous systems. An autonomous system (AS) is the term for a collection of IP networks
and routers that are under the control of a single organization that presents a common
routing policy to the Internet. Each AS is identified by a unique number. Like IP addresses,
the top-level range is controlled by the IANA, and then the individual RIRs assign them to
the network operators within their jurisdictions. These network operators can then assign
smaller ranges or individual addresses to smaller ISPs or to their customers. For example,
Rutgers is an Autonomous System (AS46) and owns the range of IP addresses 128.6.0.0 –
128.6.255.255 (128.6.0.0/16) and 165.230.0.0 – 165.230.255.255 (165.2300.0/16) as well as a few
other smaller ranges
Organizations may get permanent or temporary addresses assigned to them. With a
permanent assignment, you essentially get the IP address forever. With a temporary one, you
need to request an available address and get one that you must renew periodically.

How are machine names assigned?


When the Internet was young, in the days when it was the ARPANET, all computer names
and their corresponding addresses were managed by one person – Jon Postel at the Stanford
Research Institute’s Network Information Center (SRI-NIC).

Computer names formed a flat namespace: each name had to be unique and there was no
concept of domains or any form of hierarchy. Machines had names such as UCBVAX for a
certain Vax computer at UC Berkeley or DECWRL for a computer at Digital Equipment
Corporation’s Western Research Lab.

If you had a system on the internet, you would periodically download the latest copy of
the hosts.txt file from SRI-NIC via FTP. It was a text file that contained the names of all the
computers on the Arapanet and their corresponding IP address. By searching this file,
programs could look up the address corresponding to a specific machine name.
This worked well when there weren’t a lot of hosts on the Internet. Until around 1990, the
Internet was accessible only to companies and universities working on Department of
Defense projects. As the number of hosts on the Internet grew, the system didn’t scale:
asking people to download a file containing all the hosts on the Internet didn’t work
anymore: the file would get huge and the information within it would change too frequently.

Domain name hierarchy


Coming up with names for computers also became an issue. It is challenging to create and
manage meaningful unique names on a large scale (e.g., try picking an unused but
meaningful handle for any popular social networking service). Hierarchical naming systems
are commonly used to create names that provide uniqueness and facilitate management. A
name that is made up of a list of components is called a compound name. We see this in
names such as pathnames (/home/paul/src/qsync/main.c) and in Internet domain names
(www.cs.rutgers.edu).
The growth of hosts on the Internet led to the creation of a hierarchical namespace of domain
names. A domain is just an administrative grouping to manage names. A domain name is a
set of textual names separated by dots and organized right to left, with the top of the
hierarchy being the rightmost name. In the domain name www.cs.rutgers.edu, www is a
machine under cs, which is under rutgers, which is under the edu domain.
Internet domain names form an arbitrarily deep tree-structured hierarchy that allows us to
partition the management of computer names. For instance, rutgers is assigned a name
under edu, which is a top-level domain reserved for education institutions. This doesn’t
conflict with other places where rutgers might be used, such as rutgers.com, rutgers.net,
or rutgers.party, each of which can belong to completely different organizations.
Rutgers can then create sub-domains within its rutgers.edu namespace to allow different
groups to choose whatever they want under that part of the name.
Top-level domains
At the top of the hierarchy, under the root, we have top-level domains. There are three
categories of top-level domains:

1. Country code domains contain two-letter country code names, such as de for Germany, es for
Spain, or uk for the United Kingdom or ke for Kenya.
2. Internationalized domain names (IDN) top-level country code domains are top-level
domains that are displayed in their native language. For example, .中国 for China, .ευ for
Greece, and ‫پ اک س تان‬. for Pakistan.

3. Finally, generic top-level domains include traditional ones like .com, .edu, and .org and all
the newer ones like .party, .audio, and so on. These domains also include names in different
languages.

Currently, there are 1,589 top-level domains. The Internet Assigned Numbers Authority
(IANA) delegates the management of various domains to different organizations. Each top-
level domain has an administrator who is in charge of it. The IANA itself only keeps track of
the root servers. These root servers tell you who to contact for information about top-level
domains.

Shared registration
Domain name allocation and management is done through a system of shared registration.
The domain name registry is the master database of all domain names that are registered
under a top-level domain.
The domain name registry operator is the company that is in charge of this database. These
operators run a NIC – a network information center – that tracks information about specific
domains. The list of registry operators can be found at icann.org.
Then there’s the domain name registrar. This is the company that you use to register a
domain name. There can be many registrars for each top-level domain and each registrar can
handle registrations for multiple top-level domains. The registrars consult and update the
master database that’s managed at the Registry Operator’s NIC. The database of domain
name registrars can be found at iana..org.
Currently, 2,661 registrars provide registration services for various domains. Of these 1,202
are registrars for DropCatch.com, which is a collector of expiring domains. Dropcatch has so
many registrars because the domain name registries allow each registrar to contact them
only at a limited frequency. This allows Dropcatch to check registries essentially constantly
to pick up domain names that just expired.

The registrar you choose becomes the designated registrar for your domain. It’s the company
you go through for any changes since you cannot contact the registry directly. The registry
operator keeps the central registry database for the top-level domain. Only the designated
registrar, the company you registered your domain name with, can make changes for that
domain name unless you invoke a domain transfer to another registrar.

For example, the company Namecheap is the designated registrar for the domain
poopybrain.com and Verisign is the registry operator for the .com top-level domain. This
means that Namecheap sends information about poopybrain.com to Verisign.

Mapping names to addresses


The problem that we need to solve now is that we have two completely different things: IP
addresses and domain names. They are assigned separately and are generally unrelated to
each other.

We need a way to be able to resolve human-friendly domain names into IP addresses that
software can use to send and receive data.

Original solution
The original solution, as we saw, was to download the file containing the list of all computer
names on the Internet along with their corresponding addresses onto your own system.
Then, local software on your system can search for a name and find the address.

This was the system in place throughout the 1970s and 1980s. The file would be downloaded
via FTP from the Network Information Center (NIC) at the Stanford Research Institute (SRI).

Of course, this solution did not scale to millions of hosts on the Internet. Not only would the
file get big but there’s also a lot of churn in the data. Hosts are constantly being added and
deleted and many addresses are frequently changing.

The Domain Name System


The Domain Name System (DNS) was designed to serve as a planet-wide distributed
database that stores information about domain names and enables hosts on the Internet to
query them. It’s built as a hierarchy of name servers. A name server runs a service where
you give it a name and it gives you information about the name.
DNS is an application-layer protocol. It’s not needed in the Internet protocol stack. IP
(sockets, routers, TCP, UDP) strictly works with IP addresses. DNS is built for humans.
Computers at the edge of the network resolve names into addresses and, after that, the
network only uses addresses.

No relationship between names and addresses


It’s useful to underscore that no relationship exists between names and addresses. You can
define any name to point to any address or as an alias to any other system. For instance, the
domain cs.poopybrain.com is an alias for cs.rutgers.edu. It can also be configured to resolve to
the IP address for cs.rutgers.edu or any other system on the planet. That mapping is up to the
owner of poopybrain.com, not rutgers.edu, which owns the destination address.

DNS provides…
DNS servers provide answers to various types of information about domain names. Some of
the data they provide includes:

Addresses
Perhaps most importantly, they give us an IP address that corresponds to a
name.

Aliases
They can also provide aliases. These are called canonical name records, where
you specify that one name really refers to another name.
Name servers
They identify name servers. These are other DNS servers that tell you where to
go for more information about that domain.

Mail servers
They give you names of mail servers for that domain

Text data
They can provide arbitrary other data in text records.

DNS servers enable load distribution because you can have lots of name servers that can
handle queries for the same domain. DNS servers cache previous lookups to return
responses faster the next time someone looks up the same domain name.

They can also provide a list of IP addresses for a given domain name. This allows the client
to contact any one of several IP addresses to find available servers or to do load balancing.
Some DNS servers shuffle that list of IP addresses for successive queries so that different
clients will likely choose different addresses even if they use a simple approach such as
choosing the first address.

DNS is distributed & hierarchical


DNS is structured to mirror the domain hierarchy of domain names. The root of the
hierarchy knows about the DNS servers that are responsible for top-level domains.

Each top-level DNS server knows about the DNS servers for each domain immediately
beneath it: the edu DNS servers will know about the DNS servers
for rutgers.edu, columbia.edu, nyu.edu, and so on.
Descending dee[er] into the hierarchy, DNS servers are responsible for names within
individual organizations.

Authoritative servers

DNS has a concept of zones and authoritative servers. A zone is just a group of machines
under a node in the domain tree that’s managed by one entity. For instance, rutgers.edu is a
zone.
An authoritative name server is the DNS server that is configured for that zone rather than
some other DNS server that might have cached information about that zone.

Finding your way…

Suppose you want to contact a system at Rutgers. You need its address. That’s handled by a
DNS server that Rutgers administers. How do we find it?

The domain registry helps us here. When you register a domain with a domain registrar, you
provide it with the addresses of DNS servers that can answer queries about the domain. The
domain registrar stores this information at the domain registry.

Root name servers

We know that the information about some computer in Rutgers is sitting in a DNS server
that Rutgers administers. That doesn’t help us if we don’t know how to get to that DNS
server. To find the server we need, we can start at the root of the DNS hierarchy.

Root name servers can tell you the addresses of DNS servers responsible for all the top-level
domains. By asking any root DNS server about the computer at particular request, it will
provide the addresses for DNS servers that are responsible for the domain.
The root servers have names like A.ROOT_SERVERS.NET, B.ROOT_SERVERS.NET, and so on.
...
DNS Query types

There are two ways queries are done via DNS: iterative resolution and recursive resolution.

Iterative resolution
With iterative resolution, a DNS server returns either an answer or a referral to another DNS
server.
A referral is a message that tells you about a DNS server at a lower level in the domain
hierarchy. The DNS client must process these referrals by submitting queries to those
servers.
The advantage of iterative resolution is that each component is stateless. It either has an
answer, provides a referral, or it fails the query.

Recursive resolution
Recursive name resolution isn’t a great name because we’re not really using recursion.
Recursive resolution means that a name server is willing to take on the responsibility of fully
resolving the name so the client doesn’t have to deal with referrals. Basically, it does a
sequence of iterative resolutions until it finds a name server that gives it the answer or it
gives up if it’s unable to find one.

The DNS server never sends back referrals to the client that made the request. Instead, it will
query all the needed DNS servers to find the domain name, handle the referrals itself, and
then return either the answer or a failure to the client that made the query.

The good part about recursive resolution is that the client doesn’t need to deal with referrals
and DNS servers can cache all the intermediate results they discovered to make query
resolution quicker in the future.

While recursive resolution makes life easier for the process that is making the request, the
disadvantage of this approach is that the name server has more work to do. It may have to
issue multiple queries and process responses to resolve the domain name, maintaining the
context of the query until the response is sent.

Top-level DNS servers only handle only iterative queries. They want to remain stateless,
handle simple local lookups, and be able to support a heavy query volume with minimal
effort.

Resolvers in action
Most computers run a service called a DNS stub resolver. This is a mini DNS server that
stores and checks cached lookups so that the computer does not have to waste time
contacting a remote service each time it needs to find the address of google.com or any other
frequently accessed domains. Prior to issuing a remote query, the stub resolver also checks a
local hosts file (hosts.txt on Windows systems) to see if there are any pre-configured name-to-
address mappings.
If an answer cannot be found in the cache or in the hosts file, the stub resolver then contacts a
DNS server, often one provided by the ISP or a public DNS server such Cloudflare (1.1.1.1),
Google Public DNS (8.8.8.8), Quad9 (9.9.9.9), OpenDNS (208.67.222.222) or one of several
other free DNS services.

To summarize, DNS is special-purpose system but a great example of a distributed software


system that runs on millions of systems throughout the world and is used constantly by
everyone who accesses any Internet services.

You might also like