One of the classes that I’ve taken this semester is Advanced Distributed Systems. The course is primarily a research seminar wherein the students are required to read a couple of research topics for each class and participate in a two hour discussion which is guided by the professor. My goal in these series of blog posts is to summarize the papers in an approchable manner primarily to test my own understanding of the topic.
The first paper discussion for this class was a paper on the Developement of the DNS. This paper examines the ideas behind the initial design of the DNS in 1983 and discusses the evolution of these ideas into a working implementation. Why is this paper important in a distributed systems class? Before we answer that, lets try to define a distributed system.
A distributed system is a set of independent machines that coordinate over a network to achieve a common goal.
If you look at the definition (particuarly the emphasized words) above carefully, you’ll realize that the DNS indeed behaves as a distributed system. Infact the DNS is one of earliest examples of a distributed system deployed at scale.
HOSTS.TXT
file was used for publishing the mapping between host names and addresses. Eventually as the number of users and workstations grew it became harder and harder to transmit the file due to its increasing size.HOSTS.TXT
, was the ability for the system to be independent of network topology and to be capable of encapsulating other names spaces.www.google.com
is a node in the tree and has www
, google
, com
and .
as its ancestors. This has been demonstrated more clearly in the DNS resolution section below.The rest of the paper talks about the issues with implementing and gradually migrating the servers to use the new servers. Their experience with performance, for example, is quite indicative of how hard it is to benchmark systems that have multiple components all of which keep changing quickly.
A related surprise was the difficulty in making reasonable measurements of DNS performance. We had planned to measure the performance of DNS components in order to estimate costs for future enhancement and growth, and to guide tuning of existing retransmission intervals, but the measurements were often swamped by unrelated effects due to gateway changes, new DNS software releases, and the like.
A negative cache is a cache that also stores “negative” responses, i.e. failures. This means that a program remembers the result indicating a failure even after the cause has been corrected. In DNS, negative caching is a feature and the authors make a strong case for it in the paper.
Our conclusion is that any naming system that relies on caching for performance may need caching for negative results as well. Such a mechanism has been added to the DNS as an optional feature, with impressive performance gains in cases where it is supported in both the involved name servers and resolvers.
The primary reason for maintaining a negative cache is performance. As seen by the authors, typically one out of every four DNS queries were for negative results i.e asking resolvers for hosts or data that did not exist. To ensure that the volume of queries do not impact the system, the servers and resolvers would cache these negative responses with its own TTL.
The most interesting part in the paper for me was the idea of root servers - the servers which form the apex of the domain name system. In absence of a caching mechanism every DNS query would have to flow via one of these root servers. How many servers are there? Let’s run dig
to find out.
$ dig NS +noadditional +noquestion +nocomments +nocmd +nostats . @8.8.8.8
NS a.root-servers.net
NS b.root-servers.net
NS c.root-servers.net
NS d.root-servers.net
NS e.root-servers.net
NS f.root-servers.net
NS g.root-servers.net
NS h.root-servers.net
NS i.root-servers.net
NS j.root-servers.net
NS k.root-servers.net
NS l.root-servers.net
NS m.root-servers.net
So there are just 13 servers responsible for the whole internet? Well no, this does not mean there are 13 physical servers; each operator uses multiple servers distributed geographically to service the requests.
Who operates these servers? These servers are collectively operated by universities, companies and government bodies. To know more about who operates which server see the entry on Wikipedia.
Now that we know the overall architecture of the DNS lets see if we can figure out how a typical DNS resolution happens. Although the full cycle spans multiple steps, in practice heavy levels of caching at each step ensures that no typical name-server is inundated with requests. However, let’s assume that in this example all our caches are purged and hence our query goes via the root.
Suppose the DNS query is for: www.google.com
[subdomain] [domain] [TLD] [root (implicit)]
www . google . com .
Overall the paper give an interesting historical perspective on why the DNS came to be and the issues that came to light during its implementation and deployement. The paper, surprisingly, is also littered with good advice about software engineering and the human-side that forms the essential aspect of building software. I will end this writeup with my favorite sentence from the paper -
Documentation should always be written with the assumption that only the examples are read.
Till next time!