How to debug Fedora rawhide compose problems
From time to time rawhide composes fail and are not announced or synced to mirrors.
In the past this would happen only if the very basic setup (a mock chroot with the ‘buildsys-build’ group installed in it) broke. Additionally, in the past rawhide composes where many deliverables failed to compose were still synced out and announced, leading to days when no images were available until the issue was fixed.
Now, with the latest version of pungi (The tool that composes Fedora releases, including rawhide), composes can fail if some deliverables (Those marked in the configuration as not failable) didn’t complete. So, while rawhide can fail more easily, it also means it’s much easier to revert some change that broke images and get that fixed before it lands, and images should be always available.
So, how can you tell if a rawhide compose (or some part of it) failed and why? All the pungi logs are avilable and since all the builds take place in koji, anyone can look at them as well. Rawhide composes are of course fedmsg enabled, so you want to look for the https://apps.fedoraproject.org/datagrepper/raw?topic=org.fedoraproject.prod.pungi.compose.status.change topic. Composes can finish with 3 states:
- FINISHED – This means the compose finished and everything in it completed successfully. I am not sure we have yet seen this status in real life. 😉
- FINISHED_INCOMPLETE – This means the compose finished and only failable things failed. This is the “normal” status we see day to day.
- DOOMED – This means the compose failed it’s initial very basic setup and/or some deliverable marked as not failable failed. When this happens, it means the compose isn’t synced out or advertised. This is the status where we need to find out what caused the problem and fix it and either restart the compose or wait for the next days compose. In IRC or on fedmsg you may see this status as “failed in a horrible fire” as thats what our fedmsg translates it to.
So if you have a compose and you want to see why some part of it failed, you can look at the fedmsg and it provides a location url, but they are always the same format. Lets look at an example: https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20161002.n.0/ This means it’s a rawhide compose (there’s a Branched directory for the Branched Fedora 25 right now), then the particular compose was Fedora-Rawhide-YYYYMMDD.n.0. Yeah, month and day, then a ‘n’ to mean ‘nightly’ and a ‘0’ to mean it was the first compose of the day. If there’s more composes that same day, they are in n.1, n.2, etc. For Alpha/Beta/Final releases there’s no ‘n’ there, as they are not nightly.
The first place I go to look for issues is the global log file. This would be in logs/global/pungi.global.log and you will want to look for the image or deliverable you are interested in or search generally for tracebacks. Usually this will note a koji task id or build which you can then look up on koji. Since this is already getting long, I’ll post about tracking down koji build problems another day.