dnf (or yum) and metalinks
Like yum before it, dnf also uses (by default) metalinks served by Fedora Mirrormanager mirrorlist servers. Metalinks are a great thing, but it seems like many people don’t understand them or realize what great benefits they provide, so I thought I would do this post to help.
A metalink is a xml document that includes checksums and lists of mirrors that have repodata. In Fedora you can see it in the repos provided by the fedora-repos rpm under /etc/yum.repos.d/ like fedora.repo for example:
Note that the metalink is fetched over https, meaning if you trust the ssl cert CA setup, you will know that it comes from the Fedora mirrorlist servers and hasn’t been intercepted or tampered with. You will also note that it’s passing some data to the mirrorlist server. What repo you want, what your arch is, and also your IP address.
Once yum or dnf pulls this url, it goes to mirrorlist servers. They take the request and build a metalink xml for that request based on a cache of mirrormanager data that they have.
At the top of the metalink is included the current repomd.xml file’s timestamp and checksums. Below this there may be serveral “alternate” checksums for repomd.xml from previous repodata. Any of these are considered valid. This is to allow for syncing time for mirrors. If an updates push has just finished, it means only the master mirror is likely using the very newest repomd.xml, allowing the previous one means people will still be able to use mirrors until they sync up to the new one. After some time, the older alternative repomd.xml is dropped, leaving the current one.
Next a list of mirrors is generated: Is there some mirror that has marked itself as always preferred for your ip block? Is there a mirror that has marked itself as preferred for your ASN? Is there a mirror geographically near you? What mirrors are currently marked “up to date” (the mirrorlists have a cache, but it’s refreshed every hour and the crawler marks mirrors up to date or not all the time)? It weights things by the amount of bandwith mirrors have indicated they are able to handle (so a fast mirror with small BW doesn’t get overrun by requests). It then takes the output of this query and makes a list of mirrors in a preferred order (most preferred at the top, then down). Finally the master mirrors are added at the lowest preference to the list, so things should fall through to them if no other mirrors work.
Once you have the metalink, yum or dnf then tries the mirrors in preference for the repomd.xml. If it finds one with a matching checksum, it uses that mirror. In the repomd.xml is checksums for all the rest of the repodata, and thus all the packages in the repo. So, if you have the correct/valid metalink, you have the correct/valid repomd.xml, which means even if you download over ftp or rsync or http the packages you get MUST match the valid checksums. Of course additionally after you download and before you update or install the gpg keys are checked on the signed packages, but if you use the metalink you will never need to worry about a corrupt/invalid/unsigned package in the first place.
I’ve seen a number of people lately suggesting you should just replace the metalink in your repos with a direct link to a local mirror you like. This is really a bad idea except as a very short term workaround. If you do that:
- If the mirror you are pointing to is out of date or needs to have some kind of maint on it, mirror admins can remove it or mark it not up to date and all the metalink using folks will no longer hit it, but you will.
- You no longer have a way to tell if the metadata on that mirror has been corrupted/tampered with. True, the packages are still signed, but a evil mirror might give you known vulnerable (but signed in the past) packages. You have no way of telling. Or withholding packages with fixes from you.
- If the mirror you are pointing to is http or ftp someone could man in the middle attack it and tamper with what you get.
Finally, now that we have mirrormanager 2 deployed we are able to start looking at some enhancements for it. Some ideas I have had:
- A way to tell the mirrorlists you only want https (or http/https/rsync ok, but no ftp), etc. This should be pretty easy to do.
- Consider just dropping ftp. It’s a horrible protocol, perhaps we don’t need to keep it alive anymore.
- More comments in the metalink users get that will help us debug metalink problems. ie, how it decided what was included, etc.
- We have already been working on lots of great improvements on the crawler to do quicker crawls and detect out of date mirrors more easily, including a canary mode (check just repomd.xml), multithreading rsync checks and more.
- I’d like a way to tell what mirrors have iso/qcow2/raw images. We point people to mirrors to download those, but sometimes they get a otherwise updated mirror that doesn’t carry images and get a 404. If we kept track of which exact mirrors had that exact image we could direct people more accurately.
- <your ideas here>: please file ideas at https://fedorahosted.org/mirrormanager/
I’d like to thank Pierre-YvesChibon and Adrian Reber for all there recent work writing, rolling out and debugging mirrormanager2. Thanks!