2017-11-06

EPEL mirror file layout changes


As several people have noted, the file directory structure of EPEL has changed recently. This layout may require changes in both (1) scripts written with hard-coded locations, and (2) mirrors which were unable to get daily updates from the main mirrors.  While the changes were communicated in meetings, I did not adequately comprehend their effects to let mirrors and EPEL users know about it. This meant this announcement was delayed over two weeks.

 What Happened


The updates in the build system were to add new features and make the release engineering code more manageable. The old release style used by EPEL in EL-6 and EL-7 was different from how all other releases were done and caused several problems for the release code and mirrors.


  1.  Due to all the files of the release being in one directory, any code which needed to stat (2) the directory caused the server to go over thousands of files before returning. With EPEL being a large amount of downloads, this negatively impacted systems. Servers mirroring the data could find long delays in rsyncing the data down. 
  2. The code that generated this was a 'special' case in the Fedora releng release process which was fragile and tended to cause problems for updates and releases in both EPEL and the normal release.
  3. The layouts were different from the current Fedora release so that    people grabbing software from multiple places also had to special case their scripts.


During the updates to the release system with a new version of pungi, it was decided to remove this special case and have all software Fedora created laid out in the same structure by the build tools. This would hopefully make things much more maintainable and improve performance.

In order to safely transition, there would be a time where the old files would remain on the server in the old trees and also be hardlinked to their new location. This was intended to allow for mirrors to get the files with the minimum amount of bandwidth. However there were some problems which showed up.


  1. As I said before, I didn't grasp that the change was going to affect EPEL and didn't communicate this to the lists. 
  2. The transition time for removing the hardlinks was in days versus weeks. While most mirrors do daily updates, some only do weekly or  monthly rsync's. They missed the hardlinks completely and had to download data twice. 
  3. In the usual rule of three, various top level mirrors (mirrors.kernel.org and some others) had un-related mirroring problems at the end of October. When these servers caught up with the new layout, the hardlinked files were gone. This meant that mirrors taking data from a couple of tier1 sites had large uploads.

 How to deal with current things

The current layout structure should be 'solid' for the next couple of years. With the break down of packages into alphabetical subtrees, the 'load' per server should not require a re-ordering in the near future.

If you have written scripts which downloaded a specific file from the mirrors, (aka http://dl.fedoraproject.org/pub/archive/epel/5/i386/epel-release-5-4.noarch.rpm or some similar link), you should instead use a  stable linked package like http://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm The epel-release packages get updated regularly to get new macros or other changes so linking to a specific file is very error prone.

Otherwise one should use yum/dnf related commands to get the files from the mirrors. This is useful for mirror sites which may alter the directory structure themselves and thus only the repodata is 'safe' to figure out what to download.

2017-10-03

Ansible RPMS are no longer in EPEL-7

Ansible packages are no longer shipped in EPEL-7 as they have been included in Red Hat Enterprise Linux Extras (and similarly in CentOS-7 and hopefully Scientific Linux 7.4).

Systems which are either using Amazon Linux or Red Hat Enterprise Linux EUS release of 7.2/7.3 will need to get packages from Ansible directly using

http://releases.ansible.com/ansible/rpm/

My thanks to the Ansible Maintainer Kevin Fenzi for having the package inside of EPEL for the last several years.

2017-10-02

Nagios being updated to 4.3.4 in EPEL and Fedora

It took me longer than I wanted, but I have gotten a testing candidate for nagios-4.3.4 in EL6, EL7, F25, F26, F27 and rawhide. This will fix the security problem seen in CVE-2017-14312.

I have made a couple of changes in the RPM also as rpmlint pointed out that the libnagios.a should be in the -devel package and that various contrib items needed to be packaged up in a similar package. I expect I missed something so please test and let me know so I can get this published as soon as possible.


2017-09-06

Flock 2017 : Summary

The trip to FLOCK 2017 in Cape Code was a nice excursion where I learned a lot of things. I had not been able to go to the two previous Flocks in Rochester NY or Poland, so had not been up to date with many things. It was very nice to see many people who I had not seen in 2 years and to catch up with many projects which I had heard of and even installed servers for but not much knowledge on the details.

The days were mostly a blur of going to a couple of talks per day, a lot of hallway track items and dealing with a couple of outages which were happening that needed help on. So the following is a shortened summary:

Monday: Day 0

I posted on this earlier. The day was a pretty good one and I got to let someone else drive through Massachusetts traffic.

Tuesday: Day 1

I wanted to make sure I did not sleep through the opening day talks (something I have been known to do), so I got up extra early, had a big breakfast with some guests from Europe, and made it to sit up front. Matthew Miller gave a nice talk on the status of Fedora and was able to show some pretty pictures from data I helped collect. After trying to advertise the EPEL state of the union talk, I then went to do some hallway meetings and talked with kernel, FESCO and various developers about x86_32 support in Fedora. This was to tell the x86 committee at a meeting on 2017-09-06. 
Later in the day, I went to see Tom Callaway give a talk on licenses and the importance of a strong liver when dealing with them. It was interesting to see how far we have come in so many years. I had hoped to then go to the Fedora on Windows subsystem as I have been using Cygwin on Windows for years and wanted to see how this worked  also. However, a work item came up and I was pretty much booked until later in the evening.

Wednesday: Day 2

Today was the EPEL state of the Union talk. I spent the morning working on a blog post about everything I was going to say.. only to do a CNTRL-A backspace at the wrong moment. Goodbye writing. I am going to go over the particulars in a different post. The two talks went pretty well but I am needing to go over the videos to see what I actually said versus what I think I said. After the talks, I got to ride in a Tesla and also play various boardwalk games at a nice retro playplace. I finally went back and crashed for a bit, but woke up with insomnia til 4am. 

Thursday: Day 3

This day was a for the start of it. I was really really tired and almost fell asleep at the Fedora Infrastructure State of the Union talk. I went back to the room at 1300 for a power nap and woke up after 1700. Went to see if anything was still active and had some more hallway talks about EPEL and other architectures. Finally went back to bed at 2200 and slept soundly.

Friday: Day 4

Had a nice breakfast with most of the Fedora Infrastructure team, and then did a fast jog to catch my bus to the airport. The bus ride was supposed to be 90 minutes which would allow me 2 hours to get through security. Sadly, a Friday before Labour day weekend.. does not lead to a 90 minute bus ride. At 3 hours and somewhat, I got to the airport in time to deal with very last minute getting through security and everything else. I got onto the plane before the doors closed, and was able to fly home to be greeted by the last remnants of hurricane Harvey. We only had 40 minutes of rain from it but even as a smidgen of what eastern Texas got it was incredibly heavy rain and hail. Got home and crashed. 

Fedora Project Outage RCA :: DNS Outage 2017-09-06


Early on 2017-09-06, many people attempting to reach fedoraproject.org
found that it had disappeared from the internet. People attempting to
do 'yum/dnf install', browse the website, or other Internet related
activities were getting various error messages that the sites no
longer existed in DNS. Some people had no difficulty and were not
able to duplicate the problem, but anyone who was using a DNS server
that had dnssec checking turned on were unable to get any IP address
lookups related to the site.

The problem was due to a misconfigured record in the registrar's data
about DNS. The previous week, multiple records had been added by the
registrar to the DNS data in the .org. DNS table. The records were the
DNSsec records for fedorapeople.org, fedorahosted.org, and
fedoraproject.org, and the registrar had added them to fedoraproject.org.
versus each to the correct zone. In seeing this, I asked for two of
the records to be removed, and somehow confused which one was to
stay. This meant that the key meant for fedorahosted.org. was left for
fedoraproject.org and the fedoraproject/fedorapeople were removed.

When the registrar updated its .org. data early UTC on 2017-09-06, DNS
servers like Google's 8.8.8.8 dns no longer would show any addresses
inside of Fedora's dns tables. Other dns servers also were no longer
working and people who are on the IETF for DNSsec came into help in
case there was some other problem going on.

After diagnosing the problem, Fedora IT contacted the registrar and
got the correct DNSsec keys added to the master table. This cleaned up
the problems with many DNS servers but some will cache the broken data
for up to the TTL of 24 hours so users were still having problems as
of 2200 UTC 2017-09-06. A temporary fix is to hard code the main proxy
ip address into /etc/hosts, however this can cause problems later if
not removed and the main proxy is down for maintenance.

I would like to thank the members of the IETF dnssec group who took
the time out to help us through this problem. I would also like to
apologize to everyone who had disruption due to this.

2017-08-28

Flock 2017: Day 0

Today (2017-08-28) was the day before the official beginning of Flock 2017 which is being held in Cape Cod, Massachusetts.  This is the first Flock I have been able to go in 2 years so it has been a lot of catchup with old friends.

The day started off pretty well with only the usual planes, trains and automobiles problems. The airport kept having dyslexia problems with sending people to gate C-12 for a flight at C-15, and C-15 for a flight to C-12. The attendants could not correct the problem because the airport runs the consoles. After an hour of calls and people running back and forth, the signs were finally updated 5 minutes before the flight was ready to board. Which then led to the next fun problem for the poor attendant. The plane we were supposed to fly had mechanical difficulties, and the airline had to do a last minute replacement with a slightly smaller plane. This meant that all the seats had to be moved around and new tickets for everyone. 

The plane flight was pretty uneventful, and when I arrived I ran into Zonker Harris who was headed to Walden Pond for a bit of sunbathing. This solved the trains and automobile problems and we took the Interstate and other roads to Cape Cod. The drive was uneventful though it did remind me that Massachusetts is the one state that makes turn signals optional car equipment and car horns extra loud.

Flock is being held at a nice Golf resort in Hyannis. The room I have is on the second floor and it was nice to hear Seagulls in the distance. For dinner I had a cod dinner at the inhouse bar, and tonight I am working on getting my Wednesday Flock presentations better pictures. 

This evening, I am listening to Lynyrd Skynyrd who are playing next door. I expect FreeBird will be the closing song.

2017-06-22

Problems with EPEL and Fedora mirroring: Many Root Cause Analysis

There was a problem with EPEL and Fedora mirrors for the last 24 hours where people getting updates would get various errors like:

Updateinfo file is not valid XML:

The problem was caused by a problem in the compose which output the XML file not as xml but as sqllite. The problem was fixed within a couple of hours on the Fedora side, but it has taken a lot longer to fix further downstream.

  • Some of the Fedora mirror containers were not updating correctly. We use a docker container on each proxy to keep the data fresh. 4? of the 14 proxies said they were updating but seem to not do so. These servers were our main ipv6 servers so people getting updates from these were more affected than other users. 
  • Some mirrors only update 1 or 2 times a day (or even slower). This means that your favourite mirror may keep the data for 12 to 48 hours. 
  • Some client plugins like to peg to a quickest mirror to try and keep downloads fast. While we may tell you that there are 20 mirrors up to date, the plugin will use the one it got stuff fastest from in the past. This means you can end up with going to a 'broken' mirror for a lot longer.
  • Some yum/dnf systems seem to have other options set to keep the bad xml file until it 'ages' out. This means that while an updated xml is there, some systems are still complaining because their box already has it.
The fixes on the Fedora side are to put in better tests to try and see that this does not happen again. The client side fixes are currently to do either one of the following:

  • yum clean all
  • yum clean metadata
Thank you all for your patience on this problem.