CI and infrastructure hackfest 2017
I was hoping to write a bit of a summary for each day of the CI and Infrastructure hackfest last week, but there just wasn’t enough time to sit down and write up blog posts. Each day we started bright and early gathering in the hotel lobby at 8am, got to the Red Hat tower, got some breakfast and got to work until 5 or 6 and then we went and got dinner and back to the hotel to do it all over the next day. We definitely had some excellent discussions and got a lot of work done.
First a few thank yous: Red Hat provided us the use of the fittingly named “Fedora” room on the 9th floor to work in. It was a perfect size for our group. I think once we had to scare up an extra chair, but usually we fit just fine. OSAS (Red Hats Open Source And Standards group) funded travel and lodging for folks. Paul Frields (stickster) handled logistics and tried to keep us all scheduled, on track and in general cat herded in the right direction.
We had 3 BIG goals for this hackfest and a bunch of little ones. I think we did great on all counts. First on larger goals:
- Monday we got a detailed dump of information about our authentication systems, past, present and future. Both from a perspective of sysadmins wanting to manage and fix things and application developers wanting their app to authenticate right. The high level overview is that we have FAS2 currently as the source of all our authentication knowledge. It syncs (one way) to freeipa (a pair of replicated masters). This sync can sometimes fail, so we now know more about what to do in those cases from the sysadmin side. Then we have ipsilon that manages openid, openid-connect and (used to) handle persona. We got some detailed workflows for each of these cases. Moving forward we want to get apps using OpenID-Connect. Down the road we talked about replacing fas with a very thin layer community access api app. Not sure thats going to happen, but might be an interesting way to go.
- We wanted to harness the CentOS CI pipeline for testing a subset of optin packages from Fedora. I wasn’t directly working in this area, but I know by the end of the week we had CentOS-CI sending fedmsgs in our staging env and had setup a ci instance near it with resultsdb and taskotron to manage tests. I think there’s some more hooking up of things to go, but overall it should be pretty cool.
- Finally we wanted to look into and setup our own OpenShift instance. We had some very nice discussions with Red Hat OpenShift folks who manage and deploy things bigger than we can imagine. They gave us some very helpful information. Then we talked out initial policy questions and so forth. By friday we had a OpenShift cluster up and running in our staging env. We still need to get some official certs, sort out the ansible managing applications setup end, and figure out what we want to do for persistent storage, but otherwise we made a vast amount of progress. You can find out questions and answers at https://fedoraproject.org/wiki/Infrastructure/OpenShift
On smaller items:
- We met with the Red Hat Storage folks who manage our backend storage. There’s some nice improvements coming along later this year if we can adjust things to take advantage of it. Mostly that means splitting up our gigantic koji volume into a ‘archive’ and ‘active’ volumes and splitting our ftp volume into a ‘active’ and ‘archive’ volumes. Then we can hopefully move the active volumes over to flash storage, which should be a great deal faster. All still up in the air, but promising.
- We met with the folks who manage our main datacenter, and it’s looking possible that we will be moving all our stuff to a new area a short distance away. If this comes to pass, it would be later in the summer and there will likely be some downtime. On the plus side we would get a chance to organize machines better, get new racks and better power and all around be in a better place moving forward.
- We had a nice discussion around making bodhi faster. I’ve been over that many a time, but we did come up with a few new ideas, like generating drpms at update submission time instead of at mashing time (but that would need createrepo_c changes). Or perhaps triggering pushes only on critpath or security updates and other leaf node packages would just wait.
- There were a number of discussions around factory 2.0, modularity, branching, koji namespaces, etc.
- We had several discussions on postgres BDR (Bi directional replication). I’ve moved a number of our apps to it in staging and hoped to roll it out in production, but there were some concerns. In the end we decided to look at deploying it and then deciding on a app by app basis which ones were ready to move over. In the end we hope to have everything using it, but some apps need time to make sure they follow all of BDR’s rules and do the right thing. Additionally some apps may want to use BDR, but in sync mode to avoid possible bad data on a node crash. koji upstream needs to support things before we can move koji, but some of our smaller apps may be able to move soon. We also decided we wanted a script that could detect problems by looking at an applications schema. Pingou already went and had this done by the end of the week.
- Tim Flink (Fedora QA) and I had a nice discussion about scaling the existing QA setup. We agreed that it might make good sense to see if we could migrate this into our cloud, provided we can get it up on a modern version we could actually support. That is likely to happen in the next few months as we have some new hardware for it and are retiring some of the old hardware, so it’s a good time to force an new setup.
- We went to the ansible offices and had beer and pizza and watched a baseball game. 🙂
All in all a super productive week. Look for lots of the above to come up in meetings, tickets, mailing lists and so forth as we share plans we made, and make sure everyone is on board with them or if we need to adjust anything.