Fedora 21 was released yesterday. (If you haven’t already, go get it: https://getfedora.org )
This release was not as smooth for infrastructure as previous releases have been, for which I apologize.
Here’s what happened: For the last few weeks we had been seeing sporadic slowdowns in the bodhi application, but had been unable to isolate what was causing them. This last week was the Fedora Infrastructure Mirrormanager 2 / Ansible FAD, and there we added some more debugging in, but still couldn’t see where the problem was. It wasn’t in bodhi itself, but somewhere in it’s integration with the authentication system and getting to that via proxy01 (our main datacenter proxy). Proxy01 seemed busier than usual, but it gets a lot of traffic anyhow. We bumped memory up on it to make sure it could better cope with release day.
Then, release day: proxy02 (a server in england) started being unable to cope with load and we removed it from DNS. Then, proxy01 started having problems. Since most services were slow in any case, we updated our status page that it was release day and to expect slowdowns. Most services (aside bodhi) were actually up and fine, just slower than normal. Some folks took this to mean we were completely down, but this was not the case. Next release we probibly will make a special banner telling people it’s release day and to expect things to be slow, but up and all working.
Finally this morning Patrick discovered a problem in our DNS setup. It had been there all along, but the amount of traffic we had been seeing in the last few weeks and especially on release day made it much worse: There were only proxy02 and proxy01 available for EU dns. This means that EU folks would always get those 2 proxies, and with one out, always get that single one. There were 2 other proxies that should have been in DNS for EU, but were not. We quickly added them, added proxy02 back in and things have been very quiet since then. With proxy01 not having to handle all of the EU traffic, bodhi was happy again and with 2 more proxies closer to EU, EU users should be happy again. Many thanks to Patrick for tracking this down finally.
Sorry for the slowdowns and issues on release day. Everything should be back to normal now and we should not have this problem on the next release.
In the last week, our master mirrors have pushed out around 50TB of data. Not bad.