We had a DDoS hit our DNS servers a few weeks ago, so I thought I would write up what happened for anyone interested.
First, a bit of background: Why do we ( Fedora Infrastructure ) run DNS servers? Well, we run them to provide users resolution of our domains. It’s worth noting that we don’t provide recursive servers that will answer queries for any domain, but just authoritative servers for the domains we manage. Doing this allows us to quickly update things (which we depend on to take proxy servers in and out of rotation) as well as make sure we have dnssec working and other configuration. If we were setting this up these days, we might very well go with a trusted 3rd party provider, but we predate those really existing and for the most part it’s worked fine for us. We have a number of DNS servers, 2 of them in our main IAD2 datacenter and the rest spread out to various other places we have presence.
So, whats a DDoS (Distributed Denial of Service)? Basically it’s flooding something with requests, but not from one point/ip/subnet, but from a distributed network of machines. You can’t simply block requests from an ip/subnet, the requests are coming from everywhere!
The DDoS that hit us started pretty sharply. We started getting alerts from our monitoring, and then… couldn’t reach any of our machines in our main datacenter. Luckily we have close contact with the networking folks there and they were able to see that we had hit a connection limit in our firewall there (10million connections). They were all connections going to our DNS servers, so we had the networking folks clear the connections and block DNS from our main datacenter until we got a handle on things. That got most everything back working, but DNS was still getting flooded, leaving some users unable to resolve our domains.
I shifted to working with the remaining DNS servers outside our main datacenter. Looking at the requests that were coming in, it seemed that many or most were from legit looking nameservers or ips. So, this attack was in fact a indirect one. They were querying recursive DNS servers to query us for domains we controlled. So, blocking would not do anything but block legit users mixed in with the recursive servers. So, I took several tacks: First, I increased limits we had in place to allow those servers to process a lot more connections at once (from 1000 or so to 10000). Secondly, I setup some limits per zone. All the queries were for our old fedorahosted.org domain, which had a wildcard (ie, anything.fedorahosted.org). Limiting per zone (which bind can now do) helped other more important domains get replied to while the queries against fedorahosted.org were limited. Finally I removed the wildcard on fedorahosted.org (we haven’t used that domain in many years, nothing should us uing it now).
After doing those things (or perhaps cooincidence?) the attack stopped. Just as our servers were fully able to handle the queries coming in. So, hopefully we are better set if this happens again. If it becomes too common we may have to look at moving DNS off to a 3rd party that specializes in handling this kind of thing.
Finally, several people asked me what motive the people doing this might have. I have no idea, we are a humble Linux distribution here. I don’t think we will every know more.